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An Index of Job Satisfaction * 


Arthur H. Brayfield 
Kansas State College 


and 
Harold F. Rothe 


American Hospital Supply Corporation 


Increasingly, business and industrial con- 
cerns are studying the job satisfaction and 
morale of their employees. It has been recog- 
nized that the effectiveness of selection, train- 
ing, and supervisory programs should be 
gauged in part by their effect on employees’ 
satisfaction with their work and that specific 
personnel techniques and procedures should be 
validated against a job satisfaction criterion. 
Unfortunately, adequate indices of job satis- 
faction are difficult to obtain. This report 
describes the construction and validation of 
a quantitative index of job satisfaction which 
has been used as a criterion measure in sub- 
sequent personnel studies to be reported 
elsewhere. 


Construction 


A careful survey of the literature reveals 
that attempts to identify and estimate job 
satisfaction have preceded precise definition. 
Employee satisfaction and morale are often 
equated but seldom defined (4). Hull and 
Kolstad aptly summarize the state of affairs: 
“Although the term ‘employee morale’ is widely 
used, it remains a more or less undefined con- 
cept whose meaning, usually, is simply taken 
for granted. Such definitions as have been 
offered are of little help to the psychologist in 
the construction of items designed to measure 
morale. Thus it was (is) necessary to proceed 
on the basis of subjective judgment” (3: 350). 


* A fuller account may be found in Brayfield’s Ph.D. 
dissertation (1946) on file in the University of Minne- 
sota library. ial thanks are due Professor Donald 
G. Paterson under whose advisorship this study was 
conducted as one portion of the dissertation. 


As a working approach for this study it was 
assumed that job satisfaction could be inferred 
from the individual’s attitude toward his work. 
This approach dictated the methodology— 
attitude scaling. 

An attitude scale elicits an expression of 
feeling toward an object. It may be used 
directly with an individual to obtain such an 
expression. It permits quantification of the 
expression of feeling. These characteristics 
suggest the utility of attitude scaling method- 
ology in developing an index of job satisfaction. 

The following requirements were formulated 
as desirable attributes of an attitude scale 
designed to provide a useful index of job 
satisfaction: 


‘ 

1. It should give an index’ tor “‘over-all” job 
satisfaction rather than to specific aspects 
of the job situation. 

. It should be applicable to a wide variety 
of jobs. 

3. It should be sensitive to variations in 
attitude. 

. The items should be of such a nature (in- 
teresting, realistic, and varied) that the 
scale would evoke cooperation from both 
management and employees. 

5. It should yield a reliable index. 

. It should yield a valid index. 

. It should be brief and easily scored. 


At the time this study was undertaken the 
two most widely known and used attitude 
scaling techniques were those of Thurstone 
(10) and Likert (5). Initially the choice be- 
tween them was made on the basis of a practical 
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consideration. Since it would have been ex- 
tremely difficult to obtain employed persons as 
subjects for the item analysis required by the 
Likert technique, the Thurstone method was 
chosen. The latter method requires a number 
of judges to sort items. On the basis of other 
studies it was assumed that employed persons 
were not necessarily required as judges but 
that almost any mature person could make a 
judgment regarding the ‘“‘value” of a statement 
on an attitude continuum regardless of his 
own job situation (1, 8). 

The construction of this scale was made a 
class project in Personnel Psychology' for 
members of an Army Specialized Training 
Program in personnel psychology at the Uni- 
versity of Minnesota in the summer and fall of 
1943. Seventy-seven men cooperated. Their 
average age was around 30 years. The 
majority had had at least several years occupa- 
tional experience ranging from unskilled labor 
to professional occupations. The number of 
judges appears adequate (7, 11). 

The class was given instructions similar to 
those outlined by Thurstone for the construc- 
tion of items. Approximately 1,000 state- 
ments were turned in by the class and an 


additional 75 by the investigators. This col- 
lection was edited and the resulting 246 
statements were mimeographed, sorted into 
sets, and distributed to the ASTP men for 
judging. Each judge sorted the statements 
under supervision according to the instructions 


suggested by Thurstone. After tabulating the 
results the scale and the Q values for each 
statement were determined graphically. Care- 
ful checks were made for accuracy. 

Four specific criteria determined the in- 
clusion or exclusion of items for the preliminary 
scale. First, it was desired to have items cover- 
ing the entire range of the attitude continuum 
at approximately .5 step intervals although the 
statements at the ends of the continuum were 
eliminated as being too extreme to be practical. 
Second, the Q value, which is based on the 
degree of uniformity in the sorting of state- 
ments, was used as an objective measure of 
ambiguity in accordance with Thurstone’s 
recommendations. Consequently no item was 
selected which had a Q value of 2.00 or above. 
Third, from a purely subjective appraisal by 


1Class time was made available for this project 
hrough the cooperation of Dr. H. P. Longstaff. 
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the investigators, items were judged as to 
specificity. Items referring to specific aspects 
of a job were eliminated since an ‘“‘over-all”’ 
attitudinal factor was desired; thus items re- 
garding pay, working conditions, etc., were 
eliminated even though it might be argued that 
they reflect a general attitude. Finally, 
acceptability to employees and management 
as judged by the investigators and manage- 
ment representatives was a criterion. For 
example, the item “I am tempted to use illness 
as an excuse to stay home from this job” was 
typical of those rejected because they seemed 
to reflect unfavorably upon the individual. 

Next a preliminary scale containing eighteen 
selected statements was administered to 10 
employed female office workers and a rank 
order correlation was computed for the odd 
versus even items paired according to Thur- 
stone’s directions. The resulting rho was .31 
which was converted to an estimated product 
moment r and boosted by the Spearman-Brown 
formula to .48. 

This finding indicated a shift in method. 
Since Likert had found that his method of 
scoring attitude scales gave a higher reliability 
than Thurstone’s, his scoring technique was 
adopted and a second experimental scale was 
developed (6). ; 

As a result of experience with the preliminary 
administration, additional comments of man- 
agement, and to replace two “neutral” items 
9 new items were substituted. The resulting 
blank contained 18 items with Thurstone scale 
values ranging from 1.2 to 10.0 with approxi- 
mately .5 step intervals. The items were not 
arranged in order of magnitude of scale values. 
The Likert scoring system consisting of five 
categories of agreement-disagreement was ap- 
plied to each item. From the Thurstone scale 
value it was known in what direction to apply 
the new scoring method so that a low total 
score would represent the dissatisfied end of 
the scale and a high total score the satisfied end. 
The items were selected so that the satisfied end 
of the scale was indicated by Strongly Agree 
and Agree for one-half the items and by 
Strongly Disagree and Disagree for the other 
half. The neutral response was Undecided. 
The Likert scoring weights for each item ranged 
from 1 to 5 and the range of possible total 
scores now became 18 to 90 with the undecided 
or neutral point at 54. 
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JOB QUESTIONNAIRE 


Some jobs are more interesting and satisfying than others. We want to know how people feel about different 
jobs. This blank contains eighteen statements about jobs. You are to cross out the phrase below each statement 
which best describes how you feel about your present job. There are no right or wrong answers. We should 
like your honest opinion on each one of the statements. Work out the sample item numbered (0). 


0. There are some conditions concerning my job that could be improved. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. My job is like a hobby to me. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. My job is usually interesting enough to keep me from getting bored. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. It seems that my friends are more interested in their jobs. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I consider my job rather unpleasant. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I enjoy my work more than my leisure time. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. Iam often bored with my job. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I feel fairly well satisfied with my present job. 
STRONGLY AGREE AGREE UNDECIDED, DISAGREE STRONGLY DISAGREE 


. Most of the time I have to force myself to go to work. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. Lam satisfied with my job for the time being. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I feel that my job is no more interesting than others I could get. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I definitely dislike my work. 
STRONGLY AGREE, AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I feel that I am happier in my work than most other people. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. Most days I am enthusiastic about my work. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. Each day of work seems like it will never end. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. L like my job better than the average worker does. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. My job is pretty uninteresting. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I find real enjoyment in my work. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


. I am disappointed that I ever took this job. 
STRONGLY AGREE AGREE UNDECIDED DISAGREE STRONGLY DISAGREE 


Fic. 1. Revised job satisfaction blank. 
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The new scale was administered to 8 addi- 
tional employed female office workers and a 
rank order correlation computed for the odd 
versus even items. The resulting rho of .61 
was converted to an estimated product-moment 
r and boosted by the Spearman-Brown formula 
to.77. This was believed to be satisfactory for 
further experimentation and the revised scale 
was then printed (see Figure 1). 

Reliability 

The revised scale was administered sub- 
sequently as part of a larger study to 231 
employed female office employees in positions 
including entry, typing and stenographic, low 
and high skill level machine clerical, and 
accounting jobs. The blanks were signed 
along with other test materials. 
investigators personally administered the tests 
to employees in small groups. Typically, the 
subjects were young, unmarried girls without 
dependents. The average girl in the sample 
had completed 12 years of schooling. She had 
_ been on her present job for more than one year 
' and had been employed by the company for 
' one and three-fourths years. The range of 
_ job satisfaction scores for this sample was 
_ 35-87. The mean score was 63.8 with an 
’ S.D of 9.4. 

_ The odd-even product moment reliability 
' coefficient computed for.this sample was .77 
' which was corrected by the Spearman-Brown 
_ formula to .87. 
Validity 

The nature of the individual items is partial, 
although not crucial, evidence for the validity 
of the scale. This is an appeal to “face” 
validity. 
_ Additional evidence of a like nature is 
furnished by the method of constructing the 
scale. In developing the scale an attitude 
_ variable was specified. This was job satisfac- 
“tion which was to be inferred from verbal 
‘reactions to a job expressed along a favorable- 
unfavorable continuum. The statements used 
in this scale uniformly had small Q values 
which indicates a marked consistency among 
the judges. With the attitude variable speci- 
fied, 77 adult judges were consistent in saying, 
“This statement expresses a feeling of satis- 
faction or dissatisfaction with a job and should 


One of the- 
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be placed at such and such a point along such a 
feeling continuum.” 

A more rigorous requirement for validation 
is to use an outside criterion. An attempt was 
made to provide such a criterion. The job . 
satisfaction blank was administered to 91 
adult night schoo] students in classes in 
Personnel Psychology at the University of 
Minnesota’ during 1945 and 1946. Responses 
were anonymous. The group included 49 
males and 42 females. The age range was 
from 22 to 54 with a median of 35 years. 
Practically the entire membership was engaged 
in either clerical, semi-professional and pro- 
fessional, or managerial and supervisory oc- 
cupations. The range of job satisfaction scores 
for this sample was 29-89. The mean score 
was 70.4 with an S.D. of 13.2. 

The common denominator for the members 
of this sample was their enrollment in an 
evening class in Personnel Psychology. En- 
rollment in the class was considered to be an 
overt expression of their interest in personnel 
work. Some indication of the strength of this 
interest is afforded by their continued attend- 
ance in a night class (after a full day’s work) for 
a full semester at the end of which time the 
blank was administered. 

Given this circumstance, the following 
assumption may be made: Those persons in the 
class employed in occupations appropriate to 
their expressed interest should, on the average, 


‘be more satisfied with their jobs than those 


members of the class employed in occupations 
inappropriate to their expressed interest in 
personnel work. 

This assumption seemed reasonable to the 
investigators and provided a test of the 
validity of the job satisfaction blank. 

_ The 91 persons accordingly were divided 
into two groups (Personnel and Non-Personnel) 
with respect to their employment in a position 
identified by pay-roll title as a personnel 
function. Four occupationally unidentified 
persons were placed arbitrarily in the Non- 
Personnel group. In all, 40 persons comprised 
the Personnel group; 51 the Non-Personnel. A 
comparison was then made between the mean 
scores for the two groups on the job satisfaction 
blank. The mean for the Personnel group was 


2 These subjects were made available through the 
cooperation of Dr. H. P. Longstaff. 
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76.9 with an S.D. of 8.6 as compared to a mean 
of 65.4 with an S.D. of 14.02 for the Non- 
Personnel group. This difference of 11.5 
points is significant at the 1% level; the 
difference between the variances also is sig- 
nificant at the 1% level. Since Fisher and 
Behrens’ d-test is appropriate when significant 
differences are found both between two means 
and their respective variances, it was applied 
and the difference between the means was 
found to be significant at the 5% level (9). 
Sukhatme’s tables give only the 5% level 
values. 

If the original assumption as to the differ- 
ential significance of membership in one or the 
other of the two groups is accepted, these data 
furnish evidence for the validity of the job 
satisfaction blank. 


Comparison with the Hoppock Blank 


Perhaps the most systematic attempt to 
develop an index of job satisfaction was the 
one made by Hoppock in the early 1930’s (2). 
Originally Hoppock tried out a series of simple 
attitude scales as part of an interviewing study 
of 40 employed adults. These scales were 
revised to consist of four items each with seven 
responses at step intervals. Values of 1 to 7 
were assigned arbitrarily to the responses in 
each item, the smaller numbers being assigned 
to the responses indicating dissatisfaction. 
The range of possible total scores was 4 to 28. 
This systern of scoring correlated .997 for 301 
cases with a system of scale values assigned on 
the basis of z-scores. The corrected split-half 
reliability coefficient for the scale for the same 
301 cases was reported to be .93. The scale 
evidently has been assumed to have ‘‘face” 
validity. 

The night school classes completed both 
blanks. The product-moment correlation be- 
tween scores on the Hoppock blank (Form 11) 
and on the Brayfield-Rothe blank was .92. 
Although the two blanks were developed by 
different methods and contain items which 
over-lap only slightly they give results which 
are highly correlated. 

A comparison was made between the means 
of the Personnel and Non-Personnel groups on 
the Hoppock blank. The mean for the Person- 
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nel group was 22.2 with an S.D. of 2.6; the 
mean for the Non-Personnel group was 19.2 
with an S.D. of 4.0. The difference between 
the means is significant at the 1% level; the 
variances do not differ significantly. 


Summary 


An attitude scale which purports to be an 
index of job satisfaction was constructed by a 
combination of Thurstone and Likert scaling 
methods. A reliability coefficient of .87 was 
obtained for one large group. Evidence for 
the high validity of the blank rests upon the 
nature of the items, the method of construction, 
and its differentiating power when applied to 
two groups which could reasonably be assumed 
to differ in job satisfaction. Scores on this 
blank were highly correlated with scores on 
the Hoppock blank in the sample studied. 


Received November 10, 1950. 
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A Note on a Modified Purdue Pegboard 


Bernard M. Bass and Ralph E. Stucki 
Louisiana State University 


According to Super (1, 217-220), Tiffin and 
Asher (3) and Surgent (2), the Purdue Peg- 
board promises to be valid for measuring arm- 
and-hand dexterity of a finer type than that 
measured by the Minnesota Rate of Manipula- 
tion Test. It also appears to be a valid meas- 
ure of finger dexterity, more suitable for in- 
dustrial selection than the O’Connor Finger 
and Tweezer Dexterity Tests. It should prove 
of value for selecting employees for fairly pre- 
cise manual jobs involving assembly, machine- 
operation and packing. 

However, as Super (1, 219) pointed out, the 
reliability of the pegboard is lower than is 
desirable. In the preliminary manual, a 
reliability of .62 was given for the composite 
score based on one administration of pin 
placing with the right hand, left hand, and 
both hands. The revised manual reported a 
reliability of .71 for this composite score. For 
the assembly test, the preliminary manual 
reported a reliability of .72 and the revised 
manual reported one of .68. 

To -increase the pegboard’s reliability, the 
_ test authors suggested giving 3 administrations 
' of the test. According to the Spearman- 
' Brown prophecy formula, reliabilities be- 
' tween .85 and .90 should result. 

__ Because three administrations increase the 
' administrative load threefold in addition to 
_ increasing test length, Super (1, 219) suggested 
' that the pegboard should be increased in 
' length by providing 3 rows of holes at each 


side of the board, more pins, washers, and 
collars, and 90 seconds of working time for 
each of the 3 pin-placing tests instead of the 
present 30 seconds. He proposed that 2 
minutes be allowed for the assembly test. 

The reliability of the Purdue Pegboard could 
also be increased by making it a work limit 
test, but this would reduce administrative 
efficiency. 

With the above considerations in mind, a 
modified test was developed twice the length 
of the original. The modified board consisted 
of 4 rows of 25 holes each. Rows were equi- 
distant from each other and were centered on a 
board the same size as the original. Double 
the number of pins, washers and collars 
available for the original board were placed 
in the 4 cups as on the original board. 

Sixty seconds of working time were allotted 
for each of the pin placing tests and 120 
seconds were given for the assembly test. 
Subjects were directed to use the 2 right rows 
first when placing pins with both hands and 
then proceed to the top of the 2 left rows. All 
other directions were the same as for the 
original board. 


Test Administration 


Sixty industrial education students at the 
college level were given two administrations of 
the unmodified board and a week later were 
given two administrations of the modified peg- 
board. 


Table 1 


Subtest and Sub-retest Means and Standard Deviations Obtained on the Unmodified and 
Modified Purdue Pegboard (N = 60) 








Means 


Standard Deviations 





Unmodified 


Modified 





Unmodified Modified 





Subscore Test Retest Test 





Retest Retest Test 





Left hand 
Right hand 
Both hands 
L+R+B 
Assembly 


17.1 
17.8 
14.5 
49.4 
40.0 


17.5 
18.3 
15.2 
50.9 
42.9 


32.7 
34.4 
27.5 
94.5 
80.1 


33.8 . 3.3 
35.3 3.5 
28.5 ; 3.9 
97.7 9.0 
85.4 8.2 
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Table 2 
Test-retest Reliability Coefficients for the Unmodified and Modified Purdue Pegboard Subtests (N = 60) 








Test-retest Reliability Coefficients 





Obtained 


Expected 





Unmodified 


Subtest Board 


Modified 
Board 





2 Unmodified 3 Unmodified 
Administrations Administrations 





66 
67 
71 
79 
12 


Left hand 
Right hand 
Both hands 
L+R+B 
Assembly 


82 
85 
87 
.90 
76 


85 
86 
.90 
91 
90 





Results and Conclusions 


Table 1 shows the mean scores and standard 
deviations of scores obtained for the subjects 
on the unmodified and modified boards. Since 
they were a select group, the subjects achieved 
mean scores higher than those of unselected 
groups of college men, veterans and industrial 
men reported by Tiffin and Asher (3). Finding 
the subtest mean scores of the modified board 
less than double the mean scores of the original 
board was attributed to the fact that it was 
easier to slip pins into the more worn holes of 
the unmodified board which was older than the 
modified board. 

Since the subjects were a homogeneous 
group, it was not surprising <o find a restriction 
in range of scores compared to results obtained 
by Tiffin and Asher (3) for college men, vet- 
erans and industrial men. The restriction in 
range would help to account for the lower- 
than-expected reliabilities obtained on the un- 
modified Purdue Pegboard shown in Table 2 
which displays the test-retest reliability coeffi- 


Table 3 


Correlations Between the Unmodified and Modified 
Purdue Pegboards for Each Subtest (N = 60) 








Correlation 
Between 
Unmodified 
and Modified 
Retests 


5S 
66 
67 
89 
75 


Correlation 
Between 
Unmodified 
and Modified 


Subtest Tests 


Left hand 
Right hand 
Both hands 
L+R+B 
Assembly 








cients for scores made on the unmodified board 
and on the modified one. Also shown are the 
test-retest reliabilities predicted by the Spear- 
man-Brown formula on the basis of the un- 
modified board reliabilities. 

Except for the assembly subtest, doubling 
the length of the Purdue Pegboard by means 
of the modifications described at the beginning 
of this report, yielded subscores whose reliabili- 
ties were high enough to make the subtests 
acceptable (as far as reliability is concerned) 
for use in selection and placement. However, 
it is probable that Super’s proposal to triple 
the length of the test will have to be followed 
if the subscores are to be used for individual 
diagnosis. 

Table 3 indicates the correlations between 
scores made on the first administration of the 
unmodified Purdue Pegboard and the first 
administration of the modified one. It also 
indicates the correlations between the retests of 
the unmodified and modified boards. From 
these results it was inferred that there. was 
little difference in what the modified and un- 
modified boards were measuring since the cor- 
relations betv'een boards were almost as high 
as their reliabilities would permit. 


Received November 27, 1950. 


References 


1. Super, D. E. Appraising vocational fitness. 
Harper, 1949. 

2. Surgent, L. L. The use of aptitude tests in the 
selection of radio tube mounters. Psychol. 
Monog., 1947, 61, No. 2. 

3. Tiffin, J., and Asher, E. J. The Purdue Pegboard: 
Norms and studies of reliability and validity. 
J. appl. Psychol., 1948, 32, 238-242. 


N..¥.: 





To 


ee 


3 
Ni 
i 
, 
: 


The Minnesota Psycho-Analogies Test in the Selection of 
Psychologists for Public Service 


John S. Pearson 
Minnesota State Division of Public Institutions 


and 


Marvin W. Strate 


Minnesota State Civil Service Department 


The Minnesota Psycho-analogies Test was 
recently introduced by Levine! as an instru- 
ment designed to aid in the selection and 
evaluation of graduate students in psychology. 
While it is suggested that the test may be of 
value as an instrument for terminal evaluation 
with respect to the granting of degrees, the 
possibility of extending the application of this 
test into selection of personnel for employment 
in the field of psychology has not been raised. 

The writers considered that the psycho- 
analogies might provide a useful adjunct to 
existing techniques for the appraisal of qualifi- 
cations and the ranking of applicants for em- 
ployment as psychologists under the regula- 
tions of the Civil Service Department in the 
State of Minnesota. Accordingly Form: A of 
the psycho-analogies has been administered to 
all applicants for the position of “psychologist” 
since September, 1949. 

The position of “psychologist” exists under 
state Civil Service regulations to provide psy- 
chological services at various state mental 
hospitals, correctional institutions, schools for 
the mentally deficient, and in several traveling 
clinics. Selection of persons to fill these 
positions is made by the Civil Service De- 


_ partment on the basis of evaluation of training 


uate work in psychology. 
_ amination contains 180 four-response multiple 


_ and experience, written examination, and oral 


examination. 

The level of training required for admission 
to the written examination requires some grad- 
The written ex- 


choice type items covering general background 
in psychology, statistics, and knowledge of 
standardized tests and techniques important 
in psychological work at this level. While the 
test has never been validated against an 
external criterion, it has a fair degree of face 


1 Levine, A. S. Minnesota Psycho-analogies Test. 
J. appl. Psychol., 1950, 34, 300-305. 


validity—the items are largely exchange items 
in common use by all member agencies of the 
Civil Service assembly. The test has adequate 
reliability on the basis of internal consistency. 

The Civil Service test for psychologists and 
Form A of the psycho-analogies were admin- 
istered to 41 individuals meeting the minimum 
requirements of training and experience. The 
product-moment correlation between the two 
instruments was .85. The mean score for this 
group on the psycho-analogies was 51.4 with 
a standard deviation of 11.8. Restricting the 
sample to 23 individuals holding the Master 
of Arts Degree with a major in psychology, the 
correlation with the Civil Service examination 
remained .85, the mean ‘score on the analogies 
rose to 57.1 and the standard deviation 
dropped to 4.8. These figures would appear 
to indicate that the psycho-analogies yielded 
results in close agreement with the longer Civil 
Service examiuation. 

Attempts were also made to evaluate the 
psycho-analogies test by using a group of 23 
employed psychologists. The 23 individuals 
were ranked by a supervisor using the paired 
comparison method on a number of elements 
of job performance and an over-all rating was 
evolved. Forms A and B of the psycho-analogies 
were then administered to this group. The re- 
sults are summarized in Table 1. A rank 


Table 1 


Means and Standard Deviations on Form A, Form B, 
and Forms A and B Combined of the Minnesota 
Psycho-analogies for 23 Employed 
Psychologists 








Mean 
56.7 
55.3 

112.0 


Forms 





Form A 
Form B 
Forms A+B 








Minnesota Psycho-Analogies Test in Selection of Psychologists 


difference correlation of .56 between the over-all 
performance rating and the combined scores of 
Forms A and B on the psycho-analogies was ob- 
tained. Of the 23 employed psychologists only 
three had been away from the academic situ- 
ation for five years or more. These three per- 
sons tended to rank consistently high in terms 
of performance and relatively low in the psycho- 
analogies. By eliminating these three cases a 
rank difference correlation of .70 was obtained 
between the psycho-analogies and the perform- 
ance rating. From this it would appear that 
there may be some deterioration of the ability 
to recognize the correct answers to the psycho- 
analogies related to absence from the class- 
room setting but unrelated to performance in 
an applied situation. While the rank differ- 
ence correlations of .56 and .70 may not appear 
to be of great significance, they may be regarded 
as rather high in view of the fact that all of the 
individuals had completed the same amount 
of graduate work in psychology and the fact 
that all were performing satisfactorily on the 
job. The reliability of the supervisor’s rating 
is of course subject to question since only one 
supervisor was in a position to rate all of the 
individuals, since the work assignments were 
not identical, and since the range of individual 
differences in performance was relatively re- 
stricted. It is believed, however, that these 
factors operated to reduce rather than to raise 
the magnitude of the correlation. 
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There is no evidence to suggest that success 
on the psycho-analogies is dependent upon 
doing graduate work at the University of 
Minnesota. Eight graduates of 8 institutions 
other than Minnesota had a mean combined 
score less than one point lower than that of the 
15 Minnesotans. The product-moment cor- 
relation between Forms A and B for this 
group of 23 was .81. 


Summary 


The Minnesota Psycho-analogies Test shows 
promise of extended application by the person 
faced with the problem of objectively ranking 
inexperienced applicants for employment in the 
field of psychology. This preliminary in- 
vestigation has shown that the test gives 
satisfactory agreement with an instrument in 
use for selection of psychologists by a state 
civil service department. There is some evi- 
dence to suggest that differences in job per- 
formance by employed psychologists are re- 
flected by differences in psycho-analogies scores 
indicating that the inclusion of the psycho- 
analogies in a battery for selection of psy- 
chologists may raise its predictive value. 

The score achieved by a person on the 
psycho-analogies may reflect the length of his 
absence from the academic field of psychology 
rather than his competence in an applied 
situation. 


Received December 11, 1950. 
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Job Performance Criteria Studies: I. The Job Performance 
of Proofreaders 


C. H. Lawshe and A. D. McGinley, Jr. 
Occupational Research Center, Purdue University 


Far too often investigators who are inter- 
ested in the prediction of job success report 
elaborately designed studies using all of the 
latest statistical innovations only to conclude 
with a statement similar to the following: “It 
seems quite probable that had a better cri- 
terion measure been available a higher correla- 
tion would have been obtained.” Awareness 
of the importance of the criterion has been 
growing. The more recent textbooks are 
giving more emphasis to the problem and con- 
tributions like that of Brogden and Taylor! are 
significant. However, practice seems still to 
lag far behind. 

This paper is presented with the hope that 
it will serve to emphasize further the im- 
portance of the criterion problem. 


The Problem Situation 


Plans were drawn up for the validation of 


predictors of job performance for the job of 


proofreader in a printing plant. Predictors 
considered were items of biographical data 
taken from the application blank and a battery 
of tests beth of which -were to be validated 
against quantity and quality production meas- 
ures. As the data for these validation studies 
were being gathered it was learned that the 
proofreaders had customarily been paid an 
_ hourly rate with a periodic bonus increment 
_ based on average hourly production during a 
preceding work period. In 1949, a re-emphasis 
_ was placed on the accuracy of reading proof, 
and in June of that year 4 system for measuring 
_ proofreading errors was instituted. This meas- 
_ ure of accuracy was then used to govern 
' eligibility forbonus pay. The relative newness 
of the systematic measurement of quality and 
the possible effects of relating quality to bonus 
pay posed many questions concerning the use 
of the data as validating criteria, some of 
which must be confronted in any validation 

' Brogden, H. W., and Taylor, E. K. The dollar 
criterion—applying the cost accounting concept to 


criterion construction. Personnel Psychol., 1950, 3 
133-154, 


study, others of which are relatively unique 
to this study. . 

One such question stems from the fact that 
there are different dimensions of job perform- 
ance; are these measures correlated? That is, 
do the high quantity producers turn out low 
quality work, or vice versa, or are quality and 
quantity independent of one another? Before 
these measures can be effectively used as 
criteria, their relationship to other variables 
should also be known, and care should be 
taken to discover any contamination that 
might be present. 

A general criterion problem which assumes 
particular importance in the present case is 
that of reliability. A criterion which is not 
itself consistent cannot of course be predicted 
from an external variable. In the present 
study this problem might be complicated by 
the newness of the error measuring system and 
the effects of its institution on the proofreaders. 
That is, if prior to the measurement of errors, 
some of the proofreaders devoted greater at- 
tention to their rate of producing work than 
to their accuracy, it well might be that the 
institution of an error measuring system 
would cause these people to reduce their rate 
of production and their relative position in the 
group would change. On the other hand, 
other employees might have been accurate at 
the expense of speed, and the institution of the 
error measuring system might have raised 
their relative position in terms of production 
rate. 

Because validation studies are so highly 
dependent upon the adequacy of the criterion, 
the present study is confined to an investiga- 
tion of the criterion measures in order to permit 
a more comprehensive evaluation of proposed 
predictors in a later study. 


Procedure and Results 


Inter-relationships. Two criterion measures 
were originally available. The first, Rate of 
Productivity, is the quantity of work produced 
expressed as a per cent of time-study standards. 
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The second, Errors, is the number of errors 
made per standard hour worked. 

The criterion data originally gathered in- 
cluded 11 consecutive four-week periods which 
began with the institution of the error meas- 
uring system and continued through April 23, 
1950. One of these periods included the 
Christmas-New Year holiday season and was, 
therefore, not used. 

In order to estimate the independence of the 
two measures the mean Rate of Productivity of 
each of the 18 proofreaders for the 10 periods 
was correlated with mean Errors. The ob- 
tained intercorrelation is presented in Table 1. 
The coefficient obtained between Rate of Pro- 
ductivity and the error measure was .06. Since 
this does not differ significantly from zero it 
' appears that Rate of Productivity is a measure 
of job performance which is independent of 
the error measure. 

It was postulated that these criterion meas- 
ures might vary concomitantly with the 
amount of time worked. That is, it seemed 
possible that as the length of the work week is 
decreased the Rate of Productivity might be 
decreased in order to spread the work over a 
full week. Whereas, if there is assurance of a 


full week’s work, there might be less reason to 
restrict output and Rate of Productivity might 
increase. Furthermore, it might be that as 
the length of work week and Rate of Produc- 
tivity are decreased, the number of Errors 


decreases. Then when the length of the work 
week and the Rate of Productivity increase, 
Errors might also increase. These hypotheses 
were tested by correlating the mean number of 
hours worked per week during each of the 10 
periods with the mean Rate of Productivity and 
mean Errors for corresponding periods. The 
correlation coefficients obtained between these 
two measures and average hours worked 
(presented in Table 1) were —.30 and +.45. 
Since neither of these differ significantly from 
zero, it was concluded that Errors and Rate of 
Productivity are free of contamination by the 
amount of time worked. 

Another hypothesis to be examined was that 
the versatility of the proofreaders might in- 
fluence their performance measures. For ex- 
ample, if the more versatile people are given a 
greater number of different kinds of work, as 
contrasted with the specialist who reads only 
one kind of copy, the ‘“‘better’’ or more versatile 
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people could possibly accumulate more errors 
and/or appear to work at a slower pace because 
of the necessary adjustments to the different 
kinds of work performed. In order to examine 
this hypothesis some measure of versatility was 
required. A paired-comparison rating on ver- 
satility was done by two foremen, an instructor, 
and the department supervisor, each of whom 
was given a stack of cards. The names of two 
proofreaders were typed on each card, and 
there was one card in each stack for each pos- 
sible pair of proofreaders. The raters were in- 
structed to choose one of each pair in answer 
to the following question: “‘Which of these two 
people is qualified to read the greater number of 
different kinds of work?” The mean ratings 
for each employee were then correlated with 
Rate of Productivity and Errors in order to 
estimate the influence of versatility on these 
criterion measures. 

Versatility, measured in this manner corre- 
lates with Rate of Productivity and Errors .15 
and —.38, respectively. Since neither of these 
correlations differ significantly from zero, it was 
concluded that to the extent that the obtained 
ratings are a valid measure of versatility, the 
job performance measures are free of any in- 
fluence from versatility. Because of this in- 
dependence of the versatility ratings, they were 
given further consideration as an additiona! 
criterion. 

Reliability. With the addition of Versa- 
tility, there are three criteria. The stepped up 
reliability coefficients for these criteria are 
presented in parentheses in Table 1. The 
reliability of the Rave of Productivity and Errors 


Table 1 


Product-Moment Coefficients of Correlation Between 
Three Criterion Measures and Average 
Hours Worked 








Rate of 
Produc- 
tivity 


Average 
Hours 
Worked 


Versa- 


Errors _ tility 





Rate of 
productivity 

Errors 

Versatility 


(.99)* 06 AS 
(.95)* 


— 30 
—.38 AS 
(.95)** 





* Odd-Even Split-Half reliabilities stepped up by 
using the Spearman-Brown Prophecy Formula. 

** Average inter-correlation between four raters 
stepped up by Spearman-Brown Prophecy Formula. 
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Table 2 


Inter-Correlations Between Paired-Comparison 
Ratings on Versatility 








Rater 


A B Cc 


A (.84)* 85 88 74 
B (.94)* 85 86 
Cc (.97)* 71 
D (.69)* 











* Reliabilities estimated by using Shen’s formula 
taken from Kelley, T. L. Fundamentals of statistics, 
Cambridge: Harvard University Press, 1947, p. 421. 


measures was estimated by correlating each 
individual’s average for odd periods with his 
average for even periods. The periods were 
consecutively numbered for assignment to 
halves and the obtained coefficients were 
stepped up by the Spearman-Brown formula. 

The reliability coefficient for Rate of Pro- 
ductivity is + .99 and for Errors is +.95. Both 
Rate of Productivity and Errors averages based 
upon 10 four-week periods are relatively stable 
measures of job performance. This stability, 
coupled with the fact that the measures appear 


_ to be independent of each other and of actual 
' hours worked, permits the conclusion that 
' both may be used in future validation studies 
' in this printing plant. 


_ tility consisted of two phases. 


The estimation of the reliability of Versa- 
First, each 


| rater was compared with the other three raters 
_ by using Shen’s formula? which provided an 
_ estimate of the reliability of each rater as one 


of a pool of raters. 
possible to evaluate each rater and sub- 
sequently discard the ratings of any judge who 


This measure made it 


might appear significantly less reliable than 
the rest. Second, the reliability of combined 


‘ratings was estimated by stepping up the 
“average intercorrelation between raters by the 


Spearman-Brown formula. 

The intercorrelations between raters, which 
are presented in Table 2, ranged from +.71 
to +.88 and averaged +.82. The reliabilities 
of individual raters compared with the other 


' raters ranged from +.69 to +.97, and are 


presented in parentheses in Table 2. The 
reliability of the combined ratings estimated by 


? Kelley, T. L., Fundamentals of statistics. 
bridge: Harvard University Press, 1947, p. 421. 


Cam- 


stepping up the average intercorrelation is 
+.95.3 This indicates satisfactory reliability, 
and because Versatility is unrelated to Rate of 
Productivity and Errors, it was concluded that 
Versatility ratings may be used as a third and 
independent criterion of job performance in 
future validation studies. 

Level of Performance. As was indicated 
earlier, the collection of data was begun 
simultaneously with the installation of a 
method of payment that included error meas- 
urements as one factor. Prior to that time 
bonus pay had been determined purely on a 
quantity basis (Rale of Productivity alone was 
used). It was hypothesized that a new 
emphasis on errors might result in an immedi- 
ate drop in Rate of Productivity, followed either 
by complete or partial recovery. Changes in 
Errors might appear concomitantly. In other 
words, the period of time involved in the study 
might be constituted in whole or in part of a 
“learning period” during which the employees 
were adapting to the new system. The high 
correlations between performance during the 
odd and even periods for both Rale of Produc- 
tivity and Errors would seem to indicate that 
a given individual tends to perform at about 
the same relative level period after period. 
However, they provide no evidence regarding 

5 The difference between the individual reliabilities of 
the least reliable and second least reliable raters is .15, 
which difference raises the question of the effect of the 
least reliable rater’s ratings on the combined ratings. 
This effect was estimated by stepping up the average 
intercorrelation among the three most reliable raters, 
and since this reliability estimate was no higher than 


that for four raters, all four ratings were used in com- 
puting combined ratings. 
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Fic. 1. The average Rate of Productivity of 18 jour- 
neymen proofreaders for ten four-week periods (the 
rate of productivity is measured production expressed 
as a per cent of time-study standards). 
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ERRORS PER STANDARD HOUR 








FOUR-WEEK PERIODS 


Fic. 2, The average number of Errors made per 
standard hour by 18 journeymen proofreaders for ten 
four-week periods. 


the stability of the absolute level of perform- 
ance from period to period. If trends are 
present, it would not be prudent to use an 
average rate of producing work or an average 
error measure for all of the 10 four-week 
periods as true indices of job performance level 
inasmuch as these averages would be con- 
taminated by the learning problem. For 
these reasons the group means of each of the 
criterion measures for each of the 10 periods 
were plotted graphically (Figures 1 and 2). 
Then the data were processed by analysis of 
variance to determine whether the means dif- 
fered significantly from one another (Table 2). 

While no trend seems evident in the Errors 
curve (Figure 2), there does appear to be an 
upward trend in Rate of Productivity (Figure 1). 
The obtained F ratios for Rate of Productivity 
and Errors indicate column differences sig- 
nificant at the 5% confidence level. These 
would not ordinarily be considered significant 
differences, particularly when there is no equip- 
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ment or machinery involved in the work of 
proofreading and the highly variable “human 
element” is the dominating factor. However, 
because these data were gathered under the 
conditions described above, it would seem 
advisable to withhold any conclusions until 
more time has elapsed, and the effect of such 
possible sources of variance might be more 
adequately evaluated with the collection of 
data from subsequent work periods. 


Summary and Conclusions 


Data on two criterion measures of job per- 
formance, Rate of Productivity and Errors, were 
obtained for 18 journeyman proofreaders for 
10 four-week periods following a re-emphasis 
on accuracy and the institution of an error 
measuring system. Intercorrelations were com- 
puted, using the means for the ten periods. 

The relationship of the amount of time 
worked to the criterion measures was estimated 
by correlating the mean number of hours 
worked per week during each of the 10 periods 
with the mean Rate of Productivity, and mean 
Errors for corresponding periods. The possible 
influence of the versatility of the proofreaders, 
as determined by ratings, on their criterion 
measures was investigated also by a correla- 
tional approach. The reliability of the various 
criterion measures was determined. 

The data for Rate of Productivity and Errors 
were processed by analysis of variance to 
determine the feasibility of using 10-period 
averages of these criterion measures as indices 
of the level of job performance of the proof- 
readers. 

The present investigators believe that the 
following conclusions are supported: 


* 


Table 3 


F Ratios Obtained from Analysis of Variance of Measures of Job Performance (N = 18) 











Oh aN cematgggnte ce Laila eer 


Variance 
Estimate 
170.45 
1,961.55 
81.39 


Variable Source Sum of Squares 


1,534.06 
33,346.39 
12,452.67 





Column 
Row 
Error 


Rate of productivity 
24.10** 


.205 
1.279 
1.762 


Column 
Row 
Error 


Errors 029 
075 
012 


1.98** 
6.54* 





* Significant between 5% and 2% level of confidence. 
** Significant beyond 1% level of confidence. 
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. The Rate of Productivity and Errors meas- 
ures are independent of one another and 
are not influenced by the amount of time 
worked or by Versatility. This inde- 
pendence from versatility permits the use 
of Versatility as an additional criterion. 

. The reliability coefficients for Rate of 
Productivity, Errors and Versatility are 
quite adequate for criterion purposes 
(ranging from .95 to .99). 

3. The study of trends associated with the 
adaptation of employees to the new 
emphasis on errors was_ inconclusive. 
The collection of data must be continued 
before adequate judgments regarding ab- 
solute levels of job performance can be 
made. 


Concurrently with the completion of this 
criterion study a battery of tests was admin- 


istered to the proofreaders, and now that the 
validity studies can be conducted there appear 
to be two possible approaches. Either the 
criteria can be combined into a composite 
criterion, or the criteria can be predicted 
separately and the predictions combined. 
The present investigators believe that if the 
criteria are to be combined, the relative weight 
to be assigned to each criterion must be 
decided upon by company management per- 
sonnel because they have the experience neces- 
sary for the formulation of such a decision. 
The same weighting can be applied to the 
separate predictions, but predicting the cri- 
teria separately affords the additional ad- 
vantage of ready adaptability to changes in the 
work situation which might demand a re- 
evaluation of the relative weights given to the 
criteria. 


Received November 28, 1950. 





Validity of the Owens-Bennett Mechanical Comprehension Test * 


Robert W. Halliday, Frank M. Fletcher, Jr., and Rita M. Cohen 
The Ohio State University 


This article reports the results of a one-year 
follow-up study of 105 freshman students in 
the College of Engineering at The Ohio State 
University who took Form CC of the Mechan- 
ical Comprehension Test! at the beginning of 
the school year 1949-1950. 

This test was designed, in the words of the 
manual provided with the test, “to measure 
the ability of an individual to understand 
various kinds of physical and mechanical 
relationships,”® which ability was thought to 
be of value in engineering and allied areas. 

Form CC is reported by the authors to be 
more difficult and to yield a slightly wider 
range of scores at high ability levels than 
preceding forms AA, BB, and WI. The items 
(five-alternative multiple choice) are more 
diagrammatic and less pictorial than is the 
case in previous forms. As in the previous 
forms, however, most of the items deal with 
such common cultural objects as stepladders, 
pulleys, and steering arrangements. 

The subjects, together with 20 upperclass- 
men and 5 non-engineers whose scores were 
not used, comprised four class sections of the 
Engineering Drawing course required of all 
freshman engineers during their first quarter. 
Selection of the sample (19% of the entire 
freshman class in the College of Engineering) 
was arbitrary, inasmuch as no selective prin- 
ciple is known to be operant in the assignment 
to class sections. 

After the distribution was determined, 
scores were correlated with first-quarter and 
first-year grade averages and with specific 
course grades throughout the year. The cor- 
relations were then compared with those com- 
puted using previously-obtained Ohio State 
Psychological Test scores, (Form 23), and 
first-quarter and first-year grade averages. 

* This article is based upon a paper presented at the 
1950 meeting of the American Psychological Association 
at State College, Pennsylvania. 

1Owens, W. A., and Bennett, G. K. Mechanical 
Comprehension Test-Form CC. New York: The Psy 
chological Corporation, 1949. 

2 Owens, W. A., and Bennett, G. K. Manual for 


Mechanical Comprehension Test-Form CC. New York: 
The Psychological Corporation, 1949. Page 1. 


Results 


Mean and standard deviation for the present 
group of Ohio State freshmen, together with 
comparable data for Iowa State and Princeton 
groups as reported in the manual cited above, 
are presented in Table 1. 


Table 1 


Comparison of Ohio State Freshman Engineering 
Group with Freshman Engineers at 
Iowa State and Princeton 








School N 


105 
175 
325 
148 


S.D. 


8.59 
7.83 
7.77 
7.38 


Mean 


36.15 
36.59 
39,27 
41.97 





Ohio State Engineering 
Iowa State Agric. Engineering 
Iowa State Engineering 
Princeton Engineering 





It will be noted that the mean score for the 
;Ohio State students is somewhat lower and 
somewhat more variable than the mean score 
for the Iowa State engineers, all curricula 
combined, and that for the Princeton students. 
The differences are significant at the 1% level, 
using Student’s t. 


Correlations between Course Grades 
and Averages 


Pearson correlations between test scores and 
first-quarter grade averages and between test 
scores and selected first-quarter course grades* 
are reported in Table 2. 

The correlation of +.42 with first-quarter 
grade averages is the same as that reported for 
260 engineering freshmen at Iowa State, using 
first-term averages in Mathematics, Chemistry, 
and Drafting, and somewhat higher than the 
reported + .36 for 113 engineering freshmen at 
Princeton, using all courses during the first 
term. 

In Table 3 are reported correlations between 
test scores and first-year grade averages and 

* Correlations with grades in an engineering survey 
course, and with courses in Military Science, Physical 
Education, and Hygiene were not computed, although 


the grades obtained in these courses were included in the 
first-quarter grade averages. 
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Table 2 
Correlations Between Form CC Test Scores and First- 
Quarter Grade Averages * and Between Test 
Scores and Selected First-Quarter 
Course Grades 








Mechanical 
Comprehension 
Form CC 


+.43 (N = 104) 
+.24 (N = 83) 
+.21 (N = 77) 
+.14 (N = 88) 
+.42 (N = 105) 


Courses and Grade Average 


Engineering Drawing 

College Algebra 

General Chemistry 
Engineering English 

First Quarter Grade Averages 








* The different N’s result from individual variations 
in schedule, which variations include the requirement 
in some cases of remedial courses. 


between test scores and selected second and 
third-quarter courses for the 79 freshmen who 
completed three quarters of engineering. The 
first-year grade averages again include all 
courses taken, and the correlation between test 
scores and first-year grade averages includes 
three freshmen who were dismissed from school 
at the end of the third quarter because of poor 
grades. 

It will be noted from Table 3 that second and 
third-quarter correlations with Form CC test 
' scores were highest for the Geometry courses 
» and the advanced Engineering Drawing course. 
_  Insummary of the correlational aspect of the 
_ study, it may be said that the Form CC test 
_ scores correlated with grade averages for fresh- 
' man engineers at the +.40 level, that this 
order of correlation is consistent with the 
'.results reported by the authors of the test for 


Table 3 


' Correlations Between Form CC Test Scores and First- 
Year Grade Averages and Between Test Scores 
and Selected Second and Third- 
Quarter Course Grades 








Mechanical 
Comprehension, 
Form CC 


Grade Averages and 
Course Grades 


First-Year Averages 

Descriptive Geometry 
Trigonometry 

Analytic Geometry 

Advanced Engineering Drawing 
General Chemistry—2nd Course 


+.40 (N = 79) 
+.28 (N = 87) 
+.08 (N = 80) 
+.31 (N = 55) 
+.30 (N = 61) 
+.13 (N = 65) 





Iowa State and Princeton freshman engineers, 
and that the correlation seems to derive from 
relationship to Engineering Drawing and 
Geometry achievement. 


Mean Test Score Differences and 
“Normal Attrition” 


Inasmuch as the rate of attrition was rather 
high in the present study, as it is reported to be 
in other engineering colleges, some interest 
attaches to the mean scores of such student 
groups in the present study as: (1) those who 
transferred; (2) those who were dismissed 
because of poor grades; and (3) those who 


Table 4 


Mean Test Scores and Standard Deviations as Related 
to College Status at the End of One Year* 








Attrition Groups Mean 


36.2 


Entering group 

Transferred before end of 
1 yr.** 

Dismissed at end of 1 yr. 
or before 





30.5t 


30.5 
Withdrew before end of 
1 yr. 
Transfers, withdrawals, and 
dismissals combined 
Those successfully complet- 
ing 1 yr. 76 


26.0 


30.0* 


38.3 8.0 





* One student withdrew during the Winter Quarter 
and returned for the Spring Quarter; he is included 
above in the “Entering Group” only. 

** Eight transferred to Commerce, 6 to Liberal Arts, 
2 to Agriculture, and 1 to Education. Mean grade- 
average was 1.91. (Average of “C” = 2.00.) 

t Difference from mean of Entering Group significant 
beyond 1% level. 


successfully completed their first year in 
engineering 

In Table 4 are presented the N’s, mean 
scores, standard deviations, and significance of 
the differences between mean scores for such 


groups. 

It will be seen from Table 4 that the test 
gives promise of differentiating students who 
will succeed in an engineering curriculum from 
those who will not complete their courses. 
The fact that first-year successful students as 
a group did not differ significantly from the 
entire group at entrance suggests that it 
would probably not be possible to establish 
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both high and low cutting scores which would 
separate the entering group into: (a) those 
most likely to complete; (b) those least likely 
to complete; and (c) those for whom no pre- 
diction would be possible. 

The fact that both the successful and the 
entrance groups differed to a significant extent 
from the students who transferred, dropped 
out, or were dismissed suggests, however, the 
possibility of establishing a single cutting 
score which might separate those most likely 
to discontinue engineering during their first 
year from an undifferentiated group containing 
both the successfuls and an additional group 
of transfers, withdrawals, and dismissals. The 
raw-score distribution for the successful stu- 
dents and the distribution for the combined 
transfers, withdrawals, and dismissals showed 
a considerable overlapping at the lower end. 
A cutting. score of 35, for example, which would 
have separated a large proportion of the un- 
successful students in the present study (24 
out of a total of 28) would also have separated 
approximately one-third (24 out of 76) of the 
students who successfully completed one year 
in the College of Engineering. In other words, 
if a situation exists in which more students 
apply for engineering than can be trained, and 
if the primary selecticn objective is to reduce 
failure to a minimum, a high cutting score on 
this test would be efficient. 

If, however, the primary purpose of selection 
is to be sure that nobody who has the potential 
ability to succeed for one year in engineering 
is denied admission, a lower cutting score would 
be mc.e advantageous. A cutting score of 30, 
in the present study, would have separated 
approximately two-fifths (11 out of a total of 
28) of the unsuccessful students and only one- 
tenth (8 out of 76) of the successful students. 

In terms of the announced objectives for the 
test, the present study supports the authors’ 
contention that Form CC is a high-ability test, 
rather than a “minimum essentials’ test, of 
ability to absorb engineering training. The 
fact that there was little overlap at the upper 
end of the distribution, moreover, coupled 
with the fact that no student in the present 
study reached the maximum score, suggests 
that the test may prove useful for such pur- 
poses as awarding scholarships. 


Relative Efficiencies of Form CC and the 
Ohio State Psychological Test 


The final question which it was hoped this 
study might help answer is: ‘“‘What is the rela- 
tive efficiency of the Form CC as compared 
with other tests which might be used?” 

Accordingly, validity coefficients were com- 
puted for test scores of the 105 freshmen on 
the Ohio State Psychological Test, Form 23, 
using the same first-quarter and first-year 
grade-averages criteria used in the Form CC 
computations. The results are presented in 
Table 5. 

Inasmuch as the Ohio State test is usually 
characterized as predominantly “‘verbal,” the 
obtained validities are rather surprising. These 
results suggest that: (1) the Ohio State test 
may not be as ‘‘verbal’’ as has been supposed; 
and (2) freshman engineering courses may 
contain a sizable “verbal” component. 


Table 5 


Ohio State Psychological Test Validities as 
Compared with Form CC Validities 








Ohio State 
Psychologi- 
cal Test 


T 


Form 
CA, 


Grade Averages 


First-Quarter Grade 
Averages 

First-Year Grade 
Averages 


+.42 +.59 


+.40 +.55 





The validities cited above suggested that a 
multiple correlation involving Form CC scores, 
Ohio State Psychological test scores, and first- 
quarter and first-year averages should be run. 
This was done, but the obtained multiples 
(+.61 for the first quarter and +.58 for the 
first year) were only slightly greater than those 
for the Ohio State test alone (+.59 and +.55). 


Summary 


1. Published norms for Form CC of the 
Mechanical Comprehension Test may be some- 
what high when the test is used in engineering 
schools where admission is restricted only by 
competition from other engineering schools in 
the same area. 

2. The test appears to measure some of the 
abilities required in Engineering Drawing and 


npn eA eC ar A 








324 


in Descriptive and Analytic Geometry courses 
at the college level. 

3. The test gives promise of differentiating 
(via a cutting-score approach) students who 
will successfully complete one year of engineer- 
ing training from those who will not complete 
their first year. 

4. The differentiation appears to be due 
primarily to a high-ability factor, rather than 
to a “minimum-essentials” type of appraisal. 
Accordingly, it may prove to be especially 
useful in selecting students for scholarships 
and other subsidies. 


Halliday, Fletcher, and Cohen 


5. The efficiency of the Form CC as com- 
pared in this study with that of the Ohio State 
Psychological test, Form 23, was such as to 
suggest that: (a) Use of the Form CC adds 
little to the over-all correlations obtained with 
the Ohio State Psychological Test; and (b) 
When time of administration is an important 
factor, the thirty to forty minutes required for 
the Form CC may make this test preferable to 
the Ohio State test, which requires two to 
three hours. 


Received November 3, 1950. 





Comparative Validities of ‘“‘Short’’ Versus “Long” Tests * 


Leland D. Brokaw 


USAF Training Command, 
Human Resources Research Center, 
Directorate of Personnel Research, 
Lackland Air Force Base, San Antonio, Texas 


When the Air Force began classifying basic 
airmen for training in the more than 150 
technical specialties taught in its specialist 
schools it was discovered that complete cover- 
age by tests which approached factorial purity 
would require a test battery at least three days 
in length. Because it was felt that such an 
arduous period might reflect unfavorably in 
the motivation of the examinees an investiga- 
tion of possible methods of reducing the testing 
period was made. 

As a possible device for reducing the length 
of the requisite testing period, reduction in 
length of the individual tests was considered. 
Examination of appropriate statistical formulae 
revealed that a change in test length has 


greater effect on the magnitude of the reli- 


ability coefficient than it does on the size of the 
validity coefficient. The mathematical rela- 
tionships existing between length, reliability, 
and validity have been well established and 
stated by Thurstone (7). 


The formula for the validity of a test in- 
creased in length m times is 


lry 


Oe (1) 


ee ae Te Sip 
n 


= established validity of test x with cri- 
terion y, 

= self-correlation, or reliability of the 
test, and 

= estimated validity of test m times 
longer with criterion y. 


in which 


Inspection of this formula shows that for an 
alteration in length the variation of the validity 
is approximately in proportion to the square 
root of the original reliability coefficient. The 
effect of the change in length upon the relia- 
bility of the test is indicated by the Spearman- 
Brown prophecy formula as presented by Guil- 


* This study was ee as a Master’s thesis at 


the University of Texas under the direction of Dr. 
Gordon V. Anderson. The opinions expressed in this 
article are those of the writer and are not necessarily 
the official views of the United States Air Force. 


ford (4). The general formula which applies 
to any amount of increase in test length is 
given as follows: 


NVsz 


TF — Dre ~ 


in which 


the reliability of the altered test, 

the number of times the test is length- 
ened, and 

the original reliability of the test. 


From these formulae it is seen that the 
validity of a test which has been altered in 
length is a function of the change made in 
length and of the reliability of the original test 
as well as the original validity, whereas the 
reliability of a test altered in length depends 
only upon the amount of change in test length 
and the original level of reliability. It is also 
apparent that for a given amount of change in 
length the numerical magnitude of the relia- 
bility coefficient will be altered more than the 
coefficient of validity. 

In the practical situation with which we are 
concerned, classification of any individual is 
based upon the pattern of a series of standard 
scores. These scores, in turn, are based upon 
composite scores determined by the cumulation 
of standard scores for each test, after each of 
them has been given its optimum weight for 
each specialty. These weights are based upon 
the regression weights determined from a mul- 
tiple correlation of the several tests in the 
battery with the criterion measure. Because 
the tests are so combined it was believed that 
the level of reliability of individual tests in the 
composite might be lowered considerably with- 
out major damage to the reliability of the 
composite scores. It was further believed that 
the numerically smaller loss in the validity 
coefficients might make for even less effect on 
the validity of the composite score. Further- 
more, a battery of X tests now requiring a 
certain testing time could be expanded to in- 
clude several more tests, in the same time, if 
the original tests were individually shortened. 
These ‘‘new’’ tests could include unique vari- 
ance which would produce a higher composite 
validity in the case of the second battery than 
of the first. 

As an additional tool to aid in testing these 
hypotheses a formula for prediction of the 
correlation between two tests, both of which 
have been altered in length, was derived by 
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Dailey (1). It is an expansion of formula (1) 
above, using the same notation with the excep- 
tion that y now stands for a second test, and 
n’ indicates the degree to which its original 
length is altered. 


1 1 


Trnyn'! = i. ° an *Try- (3) 
Nie lrxr + Vex mae Jw + Pi 
n n 


General Plan for the Study 





To evaluate this hypothesis, that individual 
tests of high reliability and substantial validity 
might be considerably shortened, as used in 
a battery of tests, without serious damage to 
the resultant composite validity, it was decided 
that a battery of tests of known reliability and 
validity for a technical specialty would be 
selected. On the basis of the present validity 
and reliability of these tests, by use of formulae 
presented above, the validities and intercor- 
relations of the tests reduced to half-length 
would be predicted. Composite validities based 
upon the intercorrelations and validities of 
these full-length and the predicted half-length 
tests would be compared with the composite 
validities of actual half-length test scores. 


The Test Battery 


Six tests were selected for use in the study. 
They are comparatively unspeeded power 
tests, more appropriate to the problem than 
the temporally much shorter speed tests which 
also appeared in the basic battery of 13 tests 
which has been fully-described by Dailey, 


Brokaw and Lecznar (3). Because items from 
these tests are still in operational use, sample 
items cannot be given, however a brief de- 
scription of each test follows. 


Aviation Information (Al1) is primarily a test 
of background information specific to aircraft 
and subjects closely related to flying. It is 
designed to measure the amount of technical 
and practical information an individual may 
have relevant to the subject area. 
form used in this study was composed of 44 
items. 

Arithmetic Reasoning (AR) follows the classic 
mathematical-reasoning technique of verbal 
presentation of mathematical problems, re- 
quiring the examinee to arrive upon his own 
method for solution of the problems. The 
long form used in this study was composed of 
30 items. 

Background for Current Affairs (BCA) is a 
test of general information, primarily intended 


The long - 
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to evaluate the examinee’s awareness of the 
world in which he lives, not so much in terms 
of topical current events as in the fields of 
economic law, science, and geography. The 
long form used in this study was composed of 
44 items. 

Electrical Information (E1) is a test of prac- 
tical electrical information. It tends to avoid 
the classroom concepts of electricity in favor 
of the type of information that would be accu- 
mulated while working around automobiles, or 
in similar mechanical areas. The test is com- 
posed of two parts, the first part being con- 
cerned with electrical vocabulary, the second 
with the techniques and principles of practical 
electricity. Operationally the test is scored as 
a single unit; the same technique was used in 
this study. The long form of the test was 
composed of 86 items. 

General Mechanics (GM) is a test of mechani- 
cal information of the type which might be 
learned by observing various tools and devices 
in use. It is intended to evaluate simple infor- 
mation of a mechanical nature, without going 
into the dynamics of machines in motion, which 
area is covered by the next test below. The 
long form of the test used in this study was 
composed of 44 items. 

Mechanical Principles (MP) is primarily con- 
cerned with comprehension of the basic laws 
of mechanics, but is presented in non-academic 
form, as understanding of machines in motion. 
The long form used in this study was composed 
of 40 items. 


Hereinafter these tests will be referred to by 
the initials given after their names. 


The Experimental Sample 


The experimental group chosen for the study 
consisted of 223 graduates of the Aircraft and 
Engine (Jet) Mechanics school at Chanute Air 
Force Base, Rantoul, Illinois. The criterion 
measure used was based on actual course 
grades, being the arithmetic mean of the grades 
assigned in each of the branches of the school. 

These men were tested on the routine ex- 
perimental classification battery at Lackland 
Air Force Base, San Antonio, Texas, between 
May 1, 1947 and September 30, 1947. 


Technique of the Study 


By use of formulae 1 and 3, the intercorrela- 
tions and validities of the six half-length tests 
were predicted. The composite validities of 
the full-length tests and predicted half-length 
tests were compared with the composite vali- 
dities of the two half-length batteries formed 
by cutting the full-length tests in half. 





Comparative V alidities of “Short” Versus “‘Long’”’ Tests 


Each half-length score was synthesized from 
the full-length answer sheets by the use of 
scoring keys based upon equivalent splits of 
the items used in the test. The probability 
of a fundamental difference between the scores 
that an individual might make on two versions 
of the same test, if in one case the items appear 
alone, and in the other the items are mixed 
with an equal number of equivalent items has 
been explored by Thurstone and found to be 
very slight (7). 

The equivalent splits were made for each 
test in the following manner: first, the items 
were classified by content, then the items in 
each subgroup were split according to difficulty 
indices computed by using the upper and lower 
27% of a sample of 741 unclassified airmen. 
The items were further arranged to equate the 
internal-consistency phi coefficients within 
each subgroup between half-tests. The phi 
coefficients had been computed on the same 
sample as the difficulty indices. Care was 
taken to equate the standard deviations of the 
distributions of difficulty indices, and insofar 
as possible the ranges between forms were made 
equivalent. 

Three of these tests, AI, BCA, and GM, 
were originally composed of 45 items each. 
In order to permit an even split, one item was 
cast out of each of these tests at random. 

Statistics descriptive of the splits which were 
made appear in Table 1. 


Results 


Distributions of the scores on the short 
forms indicated that for the most part the tests 
had been split accurately. These figures are 
reported in Table 2. There is some indication 
that MP was somewhat more difficult for the 
group than the other tests, in terms of the 
somewhat low level of the mean score. This 
fact becomes less important in view of the 
equivalence of each pair of the short tests as 
indicated by the equivalence of their means 
and standard deviations. The single exception 
to this equivalence was AI which showed a 
significant difference between the means of 
the two half-length tests. Careful examina- 
tion of the statistics involved does not reveal 
the reason for this discrepancy. Inspection of 
the figures reveals no reason to expect that this 
discrepancy would have any specific effect 
upon the results of this study. 


Table 1 


Statistics Descriptive of Equivalent Halves of Tests* 





Half-Test 
Test 
Name 


Total 
Test 


Statistic I II 
Al Difficulty : 
Mean 564 564 564 
SD 205 ~=—-.204 .205 
Phi coefficient: 
Mean 390 = 356 373 
Difficulty: 
Mean ; .750 .750 
SD : 095 091 
Phi coefficient: 
Mean 453 535 
Difficulty : 
Mean . 658 
SD 
Phi coefficient: 
Mean 
Difficulty: 
Mean 
SD 
Phi coefficient : 
Mean 
Difficulty: 
Mean 
SD 
Phi coefficient : 
Mean 
Difficulty : 
Mean 
SD 
Phi coefficient: 
Mean 361 404 383 





* Based upon 741 unclassified basic airmen tested in 
June, 1948. 


Alternate-form reliability coefficients for the 
half-length tests were corrected by use of the 
Spearman-Brown prophecy formula to yield the 
split-half reliability coefficients for the full- 
length tests. Comparison of these figures 
with split-half reliability coefficients, of the 
odd-even type, given by Cowles (1), and also 
reported in Table 2, reveals relatively close 
agreement. It is to be anticipated that ap- 
proximately equal reliability coefficients will 
be given by the odd-even and the equivalent 
splits if the test is not speeded and the test 
content is relatively homogeneous, as is the 
case with the six tests in this battery. 

Reduction of the tests to half-length had 
considerable effect on the magntiude of the 


Se er eee eae 





Leland D. Brokaw 


Table 2 


Raw Score Distribution Statistics and Test Reliabilities 











Half- Length Test 
I 





Test No. a —— 
Name Items M } SD 


Full-Length Test 





Operational 
No. Test 
Items M rit 





11.25 10.33 4.86 

8.65 8.69 3.13 
13.06 13.20 4.70 
20.17 19.98 7.89 
10.91 10.77 4.26 


15.56 4.04 16.16 4.52 


hy SOR ee 
SSEAUISS\= 


44 21.53 : j 84 
30 17.28 : ; 85 
44 26.31 t i 90 
86 40.15 . ‘ OT 
44 21.63 : é 81 
40 31.67 : ; 80 





* Alternate-form correlation. 


N = 223 A and E (Jet) Mechanics. 


t Corrected for double length by Spearman-Brown prophecy formula. 


¢ Odd-even reliability corrected by Spearman-Brown prophecy formula. 


N = 881 unclassified basic airmen. 
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1948. 


§ One item dropped from operational test to permit even split on half-length test. 
** In order to eliminate negative scores to facilitate IBM computation the following scoring formulae were 


used for this test: Full-length test = Rights — Wrongs/4 + 20, Half-length test = 


reliability coefficients, lowering them, in the 
cases of MP and GM as much as .13. The 
least change was noted, logically, in the longest 
and most reliable test. This was EI, which 
dropped only .07 from .91 to .84. 

Using these full-length test reliabilities and 
the intercorrelations and validities of the full- 
length tests the intercorrelations and validities 
of the half-length tests were predicted by use of 


Rights — Wrongs/4 + 10. 


formulae (1) and (3). Inspection of this pre- 
dicted matrix in combination with the matrices 
produced by the actual intercorrelation of the 
half-length tests reveals a high degree of paral- 
lelism. In nearly every case the predicted 
correlation between two variables is equivalent 
to the arithmetic mean of the actual values 
in the two matrices. The values are so close 
as to render it unnecessary to resort to Fisher’s 


Table 3 





Intercorrelations and Validities of Full-Length Tests, Predicted Half-L inti Tests and Actual Half-Length Tests * 





Full-Length Tests 





AR BCA 'EI GM MP 





Predicted Half-Length Tests 





BCA EI 





68 68 62 28 
52 46 34 42 
57 45 29 

68 47 

47 





61 62 $ 33 
46 42 d 37 
51 25 

41 

40 





~ Half-Length Tests (I) 
AR BCA EI GM MP 





Half-Length Tests (IT) 
BCA EI GM MP 








36 63 61 57 33 
48 41 26 38 

50 36 27 

60 41 

39 


60 61 50 33 
43 42 32 36 
50 46 26 

60 43 

41 





* Decimal points omitted. 





Comparative Validities of “Short” Versus “Long’’ Tests 


Table 4 


Beta Weights and Multiple Correlations of Full-Length Tests, Predicted Half-Length Tests 
and Actual Half-Length Tests * 








Actual Half-Length 





Predicted 
Half-Length 


Val. Beta Wt. Val. Beta Wt. Beta Wt. Beta Wt. 


44 098 42 115 140 099 
39 142 37 138 113 180 
38 054 063 010 076 
46 078 106 : 157 069 
47 201 169 127 206 
MP 42 177 174 194 149 


Full-Length 


Test I Test II 








Multiple 
R 57 





* Decimal points omitted. 


Z-transformation to compare the correlation 
coefficients. In every case the validity cor- 
relations for individual tests fell within very 
narrow limits, as shown in Table 3. 

The degree of agreement among these ma- 
trices suggests the results of their multiple 
correlations, even before these coefficients are 
inspected in Table 4. We find that here, too, 
the results of the two actual matrices are 
only very slightly different from the predicted 
multiple validity, which is .01 less than that 
of the full-length test battery. With a reduc- 
tion in length of the composite tests it is in- 
evitable that the validity shall drop, if the 
constituent elements are homogeneous. The 
important element here is the fact that the 
eliminated halves of the six tests were con- 
tributing only .01 to the multiple correlations 
of the battery. Additional unique tests, even 
of comparatively low individual validity, 
should contribute more than that to the com- 
posite validity. 

It is at this point that another problem 
appears. The test lengths have been reduced 
by one-half; there has been some reduction 
in the reliability of the individual tests, but 
the battery validity, as revealed in the multiple 
correlations, has not suffered significantly. 

Can it be that the short-test battery would 
be as efficient in prediction as the long-test 
battery hasbeen? The answer to this question 
lies in an evaluation of the comparative reli- 
abilities of the short-test and long-test batteries. 


Following a technique suggested by Ferguson 
and Jackson (5) the battery reliabilities were 
computed. 

Computation of the reliabilities of the short- 
test batteries revealed that they had indeed 
been reduced, but still stood at a level of reli- 
ability (.90) often accepted as being adequate 
for individual tests. Correction of the battery 
validities for-the attenuation due to reduction 
in reliability, by a formula given by Thorndike 
(6), indicates that there has been a negligible 
loss of validity in the shortening process, but 
the loss that did occur was probably due to 
the lowered reliability of the short-test battery. 
The battery reliabilities and corrected validity : 
coefficients are reported in Table 5. 


Table 5 


Battery Reliabilities and Corrected 
Composite Validities * 








Batteryt Composite Correctedt 

Reliability Validity Composite 
Full-length test 95 57 60 
Predicted half-test 90 56 62 
Half-test I 90 55 61 
Half-test IT 90 57 63 








* Decimal points omitted. 
+ Estimated by use of Jackson-Ferguson technique. 
¢ Computed by use of formula (4). 


The formula for correction of an obtained 
score to a condition of perfect reliability 
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follows (6): 
Vor 4 
ra = Trt , (4) 
where 


correlation of true score with criterion, 
correlation of obtained score with 
criterion, 

= correlation of obtained score with 
true score. 


Conclusions 


By reduction of each of the six tests in the 
battery to half its original length we have made 
a very small reduction in their composite 
validity. Nearly half of the time formerly 
occupied by these tests in their long form is 
available for use with other types of tests. 
Thus it is that with little, if any, loss of present 
validity the opportunity is offered for possible 
inclusion of other tests which will, by. their 
contribution of unique validity, raise the 
validity of the battery significantly without in- 
creasing the present total testing time. 

In the particular case at hand it should be 
noted that even if there were no unique tests 
to insert in the available testing time such a 
reduction might well be made in the interest 
of the time that might be saved. The reli- 
ability of the “short” battery (.90) is high 
enough to permit its operational use. 

The implications of the theory demonstrated 
in this study are of extreme importance to 
every organization which uses tests in batteries 
from which is derived a composite score. The 
possibility of reduction in length of the bat- 
teries required for classification of men for 
government and industry is significant not 
only in terms of dollars and cents but also in 
terms of testing theory. It has been demon- 
strated that previous belief in the necessity for 

_ using quite long tests is unfounded when the 
_ tests are to be used in a battery. This study 
_ suggests strongly that present concepts of 
’ test reliability might well be scrutinized. 


Leland D. Brokaw 


The present study falls short of indicating 
the final limit to which the process might be 
adjusted. Because the reliabilities of the indi- 
vidual tests were not sufficiently lowered, by 
reduction of the tests to half length, the effect 
of reduction of test reliabilities to extremely 
low levels was not demonstrated. If an arbi- 
trary level of reliability, say .55, had been 
selected, and by the use of the Spearman-Brown 
formula the corresponding test length had been 
estimated, a slightly different body of informa- 
tion should have been discovered. The de- 
sirability of such a study is strongly indicated. 

However, the original purpose has been ac- 
complished. Reduction of the tests to half- 
length has yielded an insignificant drop in 
their composite validity. It has yielded a 
somewhat greater drop in reliability. The 
required testing time has been made available. 


Received November 6, 1950. 


References 


. Cowles, J. T. Quarterly research report no.7. San 
Antonio, Texas: Psychological Research and 
Examining Unit, Lackland Air Force Base, 1948. 

. Dailey, J. T. Determination of optimal test relia- 
bility in a battery of aptitude tests. Technical 
Memorandom No. 10. San Antonio, Texas: 
3309th Research and Development Group, Lack- 
land Air Force Base, 1948. 

. Dailey, J. T., Brokaw, L. D., and Lecznar, W. B. 
The development of the airman classification 
test battery. Research Bulletin 48-4. San An- 
tonio, Texas: 3309th Research and Development 
Group, Lackland Air Force Base, 1948. 

. Guilford, J. P. Psychometric methods. New York: 
McGraw-Hill, 1936. 

. Jackson, R. W. B., and Ferguson, G. A. Studies 
on the reliability of tests. Bulletin No. 12. 
Toronto, Canada: Department of Educational 
Research, University of Toronto, 1941. 

. Thorndike, R. L. Research problems and _ tech- 
niques. Report No. 3, AAF Psychological Pro- 
gram Research Reports. Washington, D. C.: 
United States Government Printing Office, 1947. 

. Thurstone, L. L. The reliability and validity of tests. 
Ann Arbor, Michigan: Edwards Brothers, 1937. 





A Sales Situation Test * 


Milton L. Rock 
Edward N. Hay & Associates, Inc., Philadelphia, Pa. 


The interest in tests that can be used to 
adequately select salesmen is still one of the 
most difficult problems that sales managers 
and personnel managers have today. Because 
of past mistakes, many industrial people who 
are responsible for selecting salesmen feel, at 
this time, that sales aptitude is impossible to 
measure. The basic question concerns the 
needs and aptitudes that must be measured. 
Sales aptitude may be defined as the capacity 
of the individual to deal effectively with 
sales environments and includes intelligence, 
interests, personality, judgment, and product 
and market knowledge. Adult intelligence 
can adequately be determined by the use of 
the Bellevue-Wechsler. There are many ade- 
quate measures of interest including Strong 
and Kuder. The tapping of the global 
personality slanted toward the salesman is 
being achieved by the use of projective tech- 
niques such as Thematic Apperception Test. 
Product and market knowledge must be given 
to the salesman through adequate training 
after he has been selected. The missing link 
and one of the most important ones has been 
the element of judgment in the sales situation. 
This test is an attempt to fill that need. 


, Selection of Items 


It is only f(ommon sense when developing a 
test to differentiate between selling and non- 
selling personnel, to interview successful men 
in the field in order to find out what it takes 
to make a salesman. Using this technique, 
we asked sales managers from some of the 
leading industrial and commercial companies 
in the country to make up selling situations 
that would require sales judgment. These 
sales situations were edited by three of these 
managers and a group of 25 sales situations 
that were felt to be most applicable were 
included. Each sales situation has four al- 
ternate judgments which are to be ranked from 
one to four. The tests were sent out to 25 

* Sales Situation Test, Form A, by Milton L. Rock 


is obtainable from Aptitude Test Service, Swarthmore, 
Pa. 


sales managers and top salesmen in the country 
and they ranked the alternate judgments 
from one to four for each question. The items 
were analyzed for difficulty and the scoring 
was developed and weighted accordingly. 


Population 


A group of 25 salesmen working for a 
large company, selling on a national scope in 
consumer markets, anonymously answered the 
Sales Situations Test and mailed it in from the 
field. Twenty-six production supervisors and 
superintendents from the same company were 
given the Sales Situations Test as one test of 
a supervisory battery. The ages, years of 
experience, technical and educational back- 
grounds of both groups are similar. This 
group of supervisors is also comparable at the 
job evaluation salary level with the sales 
personnel. This test was also administered 


to 31 salesmen working for another company, 


also on a national scope, but in a technical 
field. 


Results 


A summary of the main results of the 
standardization is found in Table 1. Refer- 


Table 1 


Frequency Distribution of Scores on Sales Situation 
Test for Two Groups of Salesmen and 
Production Supervisors 








Salesmen Salesmen Production 
Company Company Company Salesmen 
No. 1 No. 2 No. 1 Total 
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ring to Table 1, the range of scores made by 
the salesmen in Company No. 1 was from 21 to 
41 with the mean of 30.6 and a sigma of 4.8. 
The range of scores for the salesmen of Com- 
pany No. 2 was from 16 to 32 with the mean of 
25.7 and a sigma of 3.8. The range of scores 
for the production supervisors was from 8 to 
28 with a mean of 18.1 and a sigma of 5.4. 
The range of salesmen, putting both groups 
together, is from 16 to 41 with a mean of 28.2 
and a sigma of 5.2. 

A summary of the critical ratios indicating 
significances between the means of the various 
groups is found in Table 2. The critical ratio 
between the salesmen in Company No. 1 and 
the production supervisors is 8.66; significant 
beyond the 1% level. The ratio between the 


Table 2 


Critical Ratios Between Groups Tested on 
Sales Situation Test 








Production 
Company 
No. 1 


Salesmen 
Company 
No. 2 


Salesmen 
Company 
No. 1 


Salesmen 
Co. No. 1 
Salesmen 
Co. No. 2 
Production 
Co. No. 1 
Salesmen 
Cos. 1 and 2 





7.94 





salesmen in Company No. 2 and the production 
supervisors is 5.89; significant beyond the 1% 


level. The critical ratio between both sales- 
men groups together and the production 
supervisors is 7.94; also significant at the 1% 
level. The critical ratio between consumer 
salesmen and technical salesmen is 4.17; 
significant at the 1% level. 

The supervisory population took a battery 
of tests including Wonderlic Personnel Test 
and File-Remmers How Supervise Test. The 
mean of the Personnel Test was 25.9 and the 
sigma was 9.8. The mean of the How Super- 
vise Test was 50.9 and the sigma 8.7. The 
Pearson product-moment correlation between 
the Personnel Test and Sales Situations Test 
is .65. 


Milton L. Rock 


DISCUSSION 


It is quite evident from the results that the 
Sales Situations Test differentiates between 
selling and non-selling personnel. It is also 
interesting to note that the highest score 
attained by the production supervisors is 28 
which is the average of the combined salesmen 
group, below the average of the consumer 
group, and just slightly above the average of 
the technical group. There is a possibility 
that this test may differentiate between 
salesmen who sell different types of product. 
In the two groups, consumer versus technical, 
there is a significant difference. 

In sending these questionnaires out, sales 
managers and salesmen were asked to include 
comments. All the comments received to 
date are favorable. The one major suggestion 
that was made by some of the subjects can 
be summarized in the comment from one of 
the men. It should be noticed that this 
comment was the one received from both the 
population samples and the original sales 
managers. 


“IT would suggest that there are many of these 
so-called situations which are too general. On 
these the rating of the answers would depend 
very much upon the type of product and also 
the prospective clientele. Some of the ques- 
tions are basic, and, therefore, the answers 
would be representative. On these others 
mentioned above, the correct answer would 
depend upon the particular situation. There- 
fore, I would expect widely scattering answers.”’ 


The test inchided both basic and general 
situations so that it would cover salesmen in 
all situations and in all fields and it was 
hoped that it could be used as an instrument to 


‘differentiate not only between salesmen and 


non-salesmen but between industrial, com- 
merical, intangible and technical sales per- 
sonnel. 


SUMMARY 


1. The Sales Situations Test differentiates 
between salesmen and non-salesmen. 

2. The Sales Situations Test differentiates 
between commercial and industrial salesmen. 


Received July 23, 1951. 
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Simplification of Flesch Reading Ease Formula 


James N. Farr, James J. Jenkins, and Donald G. Paterson 
Department of Psychology, University of Minnesota 


Flesch’s 1948 formula (3, 4, 5) for reading 
ease involves only the use of syllable counts 
and sentence length. Experienced and inexpe- 
rienced graduate students alike were able to 
make these counts with a high degree of 
reliability (7). In one portion of this reliabil- 
ity study, counts of the “number of syllables 
per 100 words” yielded reliability coefficients 
of .99 whereas counts of average sentence 
length yielded reliability coefficients of .92. 
In the other portion of this same article, 
eighteen graduate students were reported as 
making the counts (only three of whom had 
had prior experience in applying the Flesch 
formulas). The reliability coefficient for word 
length was .95 and for average sentence 
length it was .99. We may conclude, there- 
fore, that the true reliability coefficient for 
syllable counts lies somewhere between .95 
and .99. Minor clerical errors and difficulty 
in accurately counting the number of syllables 
in polysyllabic words are probably responsible 
for the fact that reliability of syllable counts 
is less than .99, 

It occurred to the writers that a further 
simplication of the Flesch reading ease formula 
is possible by merely counting the number of 
one syllable words per 100 words instead of 
counting the number of syllables per 100 words. 
If the correlation between these two variables 
is sufficiently high the simpler method could 
be substituted without loss of precision in 
calculating reading ease scores. This simpler 
method would obviously be much faster and 
would require no knowledge of syllabification 
on the part of the analyst. It would merely 
require the analyst to recognize and count 
the number of one syllable words in each 100 
word sample. 


Review of Literature 


Justification for this simpler method can be 
found in the literature. As early as 1930, 
Johnson (8) proposed that the per cent of 
polysyllabic words found in thirty 100-word 
samples per book could be used as a rough 
measure for determining the reading difficulty 


of elementary school textbooks. He showed 
that - series of geography texts, language 
texts and readers supposedly adjusted to the 
levels of each of the elementary school grades 
ranged from 85 per cent of one syllable words 
in the first grade to 71 per cent in the eighth 
grade. In 1934, Dale and Tyler (1) computed 
correlations between a large number of 
language factors, such as number of different 
technical words, percentage of one syllable 
words, number of prepositional phrases, etc. 
and tests of comprehension of 74 samples of 
reading material dealing with the topic of 
personal health. The number of different 
technical words yielded the highest correlation 
(.46). Percentage of one syllable words cor- 
related .38 and was thus shown to be as good 
an index or a better index than some eight 
other factors all correlating between .30 and 
38. In 1935, Gray and Leary published their 
important and highly significant monograph 
(6). They computed correlations between 44 
potential elements of difficulty and compre- 
hension tests given to adults. Average sen- 
tence length in words yielded the highest 
correlations (.43 to .52). Percentage of one 
syllable words proved to be approximately 
as: good an index of comprehension difficulty 
as did average sentence-length in syllables (.43 
for the former and .47 for the latter). They 
list the percentage of one syllable words along 
with average sentence length in words as 
being among the eight most useful indices 
of difficulty (6, p. 130). One additional 
important finding in the Gray-Leary study 
is relevant here. By studying various com- 
binations of four factors each by multiple 
correlation methods, they found that nine 
different combinations will each give an 
estimate of difficulty which approximates that 
obtained by the use of more elements. In 
other words, one quickly reaches the point of 
diminishing returns, so that difficulty can be 
predicted as well by a small number of lin- 
guistic elements as by a larger number. 

Thus, search of the literature suggests that 
the number of one syllable words per 100 
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words would yield as good an index of com- 
prehension difficulty as would the number of 
syllables per 100 words. 

An empirical test of this hypothesis was 
undertaken by studying 360 one-hundred word 
samples taken from 22 employee handbooks 
put out by various divisions of the General 
Motors Corporation. 


Procedure 


The 360 one-hundred word samples were 
scored for reading ease according to the 
Flesch formula (3, 4). One~ syllable word 
counts per 100 words for each of the 360 
samples were then made. The equivalence of 
one syllable words per 100 words and the 
number of syllables per 100 words having been 
established, a new reading ease formula was 
derived and a revised table to facilitate the 
computation of the new as well as the old 
reading ease scores was prepared. Then the 
following correlations were computed: 1. 
Number of syllables per 100 words and old 
reading ease index; and 2. Number of one 
syllable words per 100 words and old reading 
ease index. 


Results 


Table 1 gives the statistical constants (means 
and sigmas) for the four variables involved. 
It will be noted that the mean reading ease 
scores by the new and by the old formula are 
substantially the same. The variability of the 
new reading ease scores, however, is slightly 
smaller than the variability of the old reading 
ease scores (14.2 as against 15.7). 

The correlation between the number of one 


Table 1 


Statistical Constants for the Two Syllable Counts 
and for the Two Reading Ease Scores Based 
on Analysis of 360 Hundred 
Word Samples 














No. of One Syllable Words 
per 100 Words 
No. of Syllables per 
100 Words 
Reading Ease Score 
(Old Formula) 
Reading Ease Score 
(New Formula) 
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syllable words and the number of syllables per 
100 words was found to be —.91. A study of 
the scatter table shows, in general, greater 
discrepancies in the middle ranges of both 
counts as one would expect. At the extremes, 
of course, there is remarkable agreement. 
An extreme discrepancy when it occurs seems 
to be due to peculiarities of particular para- 
graphs. For example, one paragraph with an 
“average”? number of one syllable words (65 
per cent) was found to have an uausually high 
syllabic count due to the frequent repetition 
in the paragraph of two five syllable words. 
Another paragraph had an unusually small 
percentage of one syllable words (47 per cent) 
coupled with a large number of two syllable 
words, hence the discrepancy was in the 
opposite direction. 

This correlation scatter table (r = —.91) 
provided the regression needed for a substitut- 
ing of the number of one syllable words per 
100 words for the number of syllables per 100 
words in the Flesch reading ease formula 
assuming that both syllable counts interact 
with average sentence length ‘1 much the 
same way. The two regression formulas are: 


Old Reading Ease Index 
= 206.835 — .846 wl — 1.015 sl 


New Reading Ease Index 
= 1.599 nosw — 1.015 sl — 31.517, 


in which wl = number of syllables per 100 


‘words; s] = average sentence length; nosw = 


number of one syllable words per 100 words. 
A new table to facilitate the computation of 
reading ease scores was prepared which gives a 
wider range of reading ease scores than was 
true of the Farr-Jenkins tables (2) and which 
can be used for the old formula as well as the 
new. This is Table 2. The upper row at the 
head of Table 2 gives the number of one 


syllable words per 100 words and the lower 


row gives the corresponding value for the 
number of syllables per 100 words. The table 
is simple to use. “Reading Ease” is entered 
vertically by average sentence length and 
horizontally by the number of one syllable 
words per 100 words or by the number of 
syllables per 100 words. The ‘Reading Ease 
Index” figure is given at the point of inter- 
section of the row and column entries. For 
example, if a sample of 100 words has an 
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average sentence length of 25 words and has 
70 one syllable words (or 149 total syllables), 
the “Reading Ease Index” is 55. This reading 
ease index may then be interpreted directly in 
terms of difficulty by Flesch’s table (3). 

It will be noted that Table 2 gives average 
sentence lengths by single steps from 9 to 38 
(rows), but gives number of syllables per 100 
words by two step jumps (columns). This 
requires interpolation to obtain reading ease 
scores for samples having intermediate values. 
For example, if a sample of 100 words has an 
average sentence length of 25 words and has 
71 one syllable words (or 147 or 148 total 
syllables), the reading ease score is 56 or 57 
(strictly speaking 56.5). Of course, if one is 
working with the old formula involving total 
syllable counts (number of syllables per 100 
words), Table 2 will require a greater amount 
of interpolation since the gaps in these counts 
from column to column are apparently greater. 
Since the differences between the reading ease 
scores themselves must be quite large to have 
any real meaning, it is apparent that clerical 
errors in the process of interpolation will 
necessarily be of minor importance. 

The correlation between the two types of 
syllable counts and’ the old reading ease score 
proved to be quite high. The correlation 
between number of syllables per 100 words and 
the old reading ease score was, of course, 
higher (—.87) than that between the number 
of one syllable words per 100 words and the 
old reading ease score (+.76), since the former 
correlation involves a part-whole relationship. 

The crucial test of the practical value of the 
simplified syllable count method, of: course, 
hinges on the correlation between the old 
reading ease score and the new reading ease 
score for the 360 one hundred word samples 
available for study. Each of these samples 
was rescored using the number of one syllable 
words per 100 words. Then the old reading 
ease scores were correlated with the new read- 
_ ing ease scores. The correlation turned out 
to be +.93. Table 3 presents a condensed 
scatter table of this result but using the 
unequal steps involved in Flesch’s table for 
interpreting “difficulty.” There is perfect 
agreement for 237 of the 360 paragraphs. 
There is a disagreement of only one step for 
119 paragraphs. Many of these, of course, 
are in reality negligible disagreements since 
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Table 3 


Scatter Table Showing Relation Between Flesch’s Old 
Reading Ease Categories and the New 
Reading Ease Categories 








Old Reading Ease Categories 





pimi- | Fairly Stand-| Fairly 
Diffi- 5 
cult eal ard | Easy 


30-50 | 51-59 | 60-69 


Easy 


70-79 | 80-89 





Easy 1 
80-89 
Fairly 
Easy 
70-79 
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60-69 
Fairly 
Diff. 
































New Reading Ease Categories 








63 | 63 | 23 
| | 




















many fall in the upper portion of one category 
and the lower portion of an adjacent category. 
In only four instances is there a disagreement 
of two steps (in one instance the old index was 
“Fairly Easy” and the new was “Fairly Dif- 
ficult,” and in the other three instances the 
old index was “Standard” and the new index 
was “Difficult’’). 

In view of the fact that Flesch himself 
recognizes that his readability yardstick is 
but a crude index of comprehensibility of prose 
passages, the writers are inclined to stress 
the comparability of the new and the old 
formulas.'. Here we are stressing the com- 
parability of the new and the old reading ease 
indexes as applied to samples of 100 words each. 
But, in actual use the Flesch formula is 
applied to whole books rather than to single 
samples. Thus, a large number of samples 
are drawn and a reading ease index is derived 
by averaging the reading ease for all the 
samples. Such an average, of course, would 
be more stable than scores for single samples. 

A more practical test of the simplified 
formula in comparison with the old formula 
would be afforded by comparing the average 
reading ease indexes for each of the 22 employee 


1 Flesch (5, p. 37) answers the question ‘How ac- 
curate?” by saying “The Test will give you only a rough 
estimate of readability.” 
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handbooks. These were computed and found 
to yield a product-moment correlation of 
+.95. That this correlation of averages is no 
higher is due to the fact that these handbooks 
do not differ much among themselves ranging 
from a mean of 36 (“Difficult”) to 57 (“Fairly 
Difficult”’). 

Thus, it is safe to predict that were the new 
and old formulas to be applied to a wide 
range of books or other prose material with a 
heavy sampling in each difficulty level from 
the “Very Easy” to “Very Difficult” categories, 
the correlation of averages would probably 
reach +.99. The correlation of + .95. reported 
in the preceding paragraph should, therefore, 
be looked upon as the lower limits of agreement 
likely to be found in the practical application 
of Flesch’s readability yardstick. 


Summary 


It is proposed to substitute for syllable 
count per hundred words in Flesch’s reading 
ease formula the number of one syllable words 
per hundred words. This latter count could 
be made much more rapidly and would not 


require knowledge of syllabification on the 
part of the analyst. 

Justification for this simplification is found 
in the literature. 

Study of 360 hundred word samples from 22 
employee handbooks showed a high correlation 


between these two variables (—.91). Based 
on this correlation, a new reading eake regres- 
sion equation was derived and a table to 
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facilitate computation of the new reading ease 
index was prepared. 

Both methods of counting word length 
correlated quite high with reading ease scores 
(.87 and .76). The correlation between the 
old reading ease scores and the new reading 
ease scores for the 360 paragraphs was .93. 
Finally, the average reading ease scores for 
the 22 booklets computed on the basis of the 
old and the new formulas was +.95. 

The evidence indicates that the simplified 
formula may safely be substituted for the 
original formula. 


Received July 25, 1951. 
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Halo in the Employment Interview 
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The “halo effect” has received far more 
discursive reference than research attention 
since Thorndike (3) first added the term to the 
psychologist’s jargon more than thirty years 
ago. These two ingredients recur in the 
many definitions given in the literature, (1) 
the undue influencing of the rater’s judgment 
on one or more traits or characteristics by his 
observations with respect to other unrelated 
traits and characteristics; and (2) the similar 
influencing of the rater’s judgment by a 
general attitudinal set and general impression 
with respect to whatever or whoever is judged. 

A useful distinction can be drawn between 
systematic and sporadic halo. By the former 
is meant a persisting tendency in the rater’s 
judgments, relatively independent of what- 
ever or whoever is being judged. As dis- 
tinguished from this, sporadic halo has special 
reference to the case of a rater making judg- 
ments with respect to another individual and 
is thought of as a function of this unique 
inter-personal relationship. While it might 
be surmised as operative in a given rater-ratee 
situation, by definition, it defies systematic 
research since it is non-recurrent. 

Bingham (2), in 1939, added to the psy- 
chological terminology ‘‘valid halo” and ‘‘in- 
valid halo.” By the former is meant the 
expected overlap among traits and the overlap 
between each trait and an over-all or summary 
judgment. Invalid halo, in contrast, refers 
to any: excess overlap in these situations 
beyond what might be expected. 

Valid halo may be symbolized statistically 
as: i‘ 
heal 

This indicates the intercorrelations in the 
population among j traits and where one of 
these may be, but does not need to be, a 
summary or over-all trait. 


An empirical check of the presence of invalid 
halo in a given rater’s judgments can be made 
by setting an exact hypothesis with respect to 
the intercorrelations among his quantifications 
of judgment on the various traits he is asked 
to rate in relation to the actual or hypothesized 
population correlational matrix. For example, 
in the employment situation in a given com- 
pany, if many persons conduct the interviews 
an intercorrelational matrix including data 
from all interviewers can be computed. This 
can be hypothesized as the population matrix. 
The significance of the deviations of the 
elements in each interviewer’s matrix from the 
corresponding population data can then be 
determined, thereby demonstrating the pres- 
ence, if any, of invalid halo. Obviously, if a 
given interviewer has done a sizeable propor- 
tion of the total interviews considerable 
spuriosity is introduced unless his data are 
evaluated in relation to the company inter- 
correlational matrix with his own ratings 
excluded. The following is a fictitious example 
to illustrate this type of analysis. 

We will suppose that, say, six interviewers 
conducted a total of 300 interviews with 
applicants for clerical positions in a particular 
company. Each interviewer quantified judg- 
ment on each of three traits in rating each 


Table 1 


Inter-correlational Matrices of Judgment Quantifica- 
tions Made by the Company (All Interviewers) 
and One Interviewer 








Interviewer No. 1 


Company 





A B Cc 
A A 6 
B 5 





* Significantly deviant from corresponding company 
values at the .01 level of confidence. 
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applicant. In Table 1 are presented the 
intercorrelational matrices for the company 
(N = 300) and for interviewer No. 1 (N = 81). 
We have rigged the data so that interviewer 
No. 1 clearly shows systematic, invalid halo. 
His judgments on trait A markedly influence 
his judgments on the other traits. 

Any discussion of halo would be incomplete 
if mention were not made of suggested remedies 
for it. Allport has done an excellent job of 
summarizing these. He suggests the follow- 
ing: “(a) by specific warning against it, 
(b) by employing distinctive and well-defined 
variables, (c) by using alert and trained 
judges, (d) by avoiding characterial and 
censorial variables, (e) by so varying the 
presentation of the qualities to be rated that 
a fresh and independent consideration of each 
is demanded, (f) by avoiding haste and pet- 
functoriness in making ratings, and (g) by 
averaging together the ratings of several 
judges so that to some degree the prejudices 
of the several judges will cancel one another” 
(1, p. 447). 

Halo is of concern to anyone interested in 
employment interviewing. The writers 
thought that halo in this situation might 
manifest itself in the evidence furnished by 
interviewers to justify the ratings which they 
accord applicants on various traits. In one 
client company a semistructured selection 
interview was introduced. In the procedure 
prescribed each interviewer was required to 
quantify his judgment of each applicant on 
nine traits plus a summary recommendation 
to hire or not. Furthermore he was asked to 
furnish, in spaces provided on the report form, 
evidence to substantiate each judgment on 
each trait. The remainder of the paper’ is 
concerned with reporting upon various analyses 
made of this evidence. It should be noted 
that this method can be used in investigating 
systematic halo as it occurs either in individual 
raters or in concensus judgments of a rater 
group. Evidence for the existence of halo can 
be established independently of any inter- 
rater comparisons. It is conjectured that the 


1 Regan’s findings are reported in detail in his M.A. 
dissertation, Am investigation of evidence cited to sub- 
stantiate quantification of judgments on the part of hiring 
interviewers, Fordham University, 1951. 
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methodology if not the specific results may 
have relatively wide application wherever 
ratings are sought in interpersonal situations. 


Methodology 


As indicated above the interview situation 
with which we are concerned is semi-directed, 
in that the interviewing executives must, as 
soon as possible upon completion of the inter- 
view, fill out an interview form which requires 
quantification of judgment upon specific traits. 
Judgment quantification is made upon nine 
four-point scales, the points on which are 
more or less behaviorally defined ih terms of 
present employees. The points are: Low, 
Average, Above Average, Outstanding. The 
traits, too, are behaviorally defined as well as 
possible and are: Inquiring Mind, Oral 
Expression, Written Expression, Creative 
Ability, Perseverance, Drive, Interest, Intel- 
ligence, and Judgment. 

The applicants were being considered for an 
executive training program in which emphasis 
was upon sales and advertising. Typically, 
each applicant was interviewed by four 
executives. Each executive conducted his 
interviews and prepared his reports independ- 
ently. . The decision to hire was made on the 
basis of a concensus of interviewers’ judgments 
as reflected in their reports which were 
reviewed and collated by the general manager 
of the company. 

All such interviews conducted during the 
calendar year 1949 constitute the basic data. 
Eighty-two screened: applicants were inter- 
viewed and 318 interviews were conducted. 
The interviews contained 2,562 analyzable 
statements, in support of judgment quantifica- 
tions. 

The group of eighty-two applicants was 
dichotomized according to ultimate disposition 
by the company into an accepted group (V 
= 37) and a rejected group (V = 45). State- 
ments in support of identical judgment 
quantifications, trait by trait, were then 
analyzed and compared. For example, all 
statements in support of the rating Low on 
Inquiring Mind in the accepted group were 
compared with all statements in support of 
the same rating on the same trait in the rejected 
group. ’ 
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Since this study was conceived and executed 
well after the interviews were conducted, the 
interviewers had no knowledge that their 
reports were to serve other than the usual 
purpose of aiding in selecting personnel. 

These hypotheses were set by the inves- 
tigator: 


1. In comparing the accepted and rejected 
groups, the evidence in support of identical 
judgments within the interview will contain 
significantly more units of negative evidence for 
the rejected group. 

2. In comparing accepted and _ rejected 
groups, the number of idea units cited in 
support of identical judgments within the 
interview will be significantly different. 

3. In comparing accepted and rejected 
groups, the number of evidence units cited in 
support of identical judgments within the 
interview will be significantly different. 


An idea unit was arbitrarily defined by the 
investigator as a concept expressed in one or 
more terms. An evidence unit was defined as 
a statement by the interviewer with respect 
to any activity in the applicant’s past history, 
any projected activity stated by the applicant, 
or any of the applicant’s behavior during the 
interview. An interviewer may have utilized 
one or several idea units in presenting one 
evidence unit, or he may have used one or 
more idea units to express a personal opinion 
not evidential in character. A negative ev- 
idence unit was a unit of evidence which was 


Table 2 


Model for Analyses of Data Involved in the Hypotheses 
Concerning Evidences of Halo 





Out- 
standing 
M, M,; M, 
M, M, Ma, 
Dif. Dif. Dif. 
oD oD oD 
t t t 
p p p 





Above 
Low Average 


Average 








Note: Subscript r = rejected group; subscript ¢ = 
accepted group. This model was used for the total 
data as well as for the breakdown by each trait singly. 


W. J. E. Crissy and James J. Regan 


cited by the interviewer to demonstrate a 
weakness in the applicant with respect to a 
given trait. 

Table 2 depicts the model followed for the 
statistical analyses of the data. Hypothesis 
1 involved a one-tail test of significance, 
hypotheses 2 and 3 required a two-tail test. 


Results 


The detailed results have been reported in 
Regan’s dissertation. These were the high- 
lights of the findings. With respect to hypoth- 
esis 1 regarding negative evidence, for the 
middle two judgment quantifications (Average 
and Above Average), constituting 85% of all 
judgment quantifications, the rejected group 
received significantly more negative evidence 
when the interview reports were considered 
asa whole. With the exception of two of the 
nine traits, this same result obtained for either 
or both the middle two judgment quantifi- 
cations when the data were broken down trait 
by trait. 

Hypothesis 2 regarding idea units was 
rejected. In none of the categories for the 
reports as wholes and for the data broken 
down by traits was the difference between the 
rejected and accepted groups significant. 

In the case of hypothesis 3 regarding 
evidence units the data warranted an accept- 
ance of the hypothesis. Considering the form 
as a whole, the interviewers presented signif: 
icantly more evidence in support of a “Low” 
quantification when the applicant was in the 
accepted group than to justify the same 
“Low” quantification for applicants in the 
rejected group. Conversely, the interviewers 
apparently found it necessary to present 
significantly more evidence for an “Outstand- 
ing” quantification for individuals in the 
rejected group than for the same “Outstand- 
ing” quantification for applicants in the 
accepted group. 


Summary 


“Halo effect’? when systematic with raters 
admits of empirical demonstration. Two 
methods for doing this are suggested, one of 
them new. Both seem feasible especially 





Halo in the Employment Interview 


when dealing with ratings of applicants 
accorded by employment .interviewers. The 
first method involves a correlational analysis 
of each rater’s judgment quantifications in 
relation to the correlational matrix for the 
population of which these judgments are a 
sample. The second is based upon an analysis 
of the evidence cited by raters to support their 
judgment quantifications. Using the latter 


method in an exploratory study it was found 
that quantity of evidence and quantity of 
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negative evidence show promise as indices of 
systematic halo. 


Received July 26, 1951. 
Early publication. 


References 


1. Allport,G. A. Personality. New York: Henry Holt 
and Company, 1937. 

2. Bingham, W. V. Halo, invalid and valid. J. appl. 
Psychol., 1939, 23, 221-228. 

3. Thorndike, E. L. A constant error in psychological 
ratings. J. appl. Psychol., 1920, 4, 25-29. 








Ae te TS 


t 
i 
: 


Neighborhood Reaction to Public Low-Rent Housing * 


Kenneth E. Clark and Charles E. Swanson 
University of Minnesota 


The factors which determine how a group 
will react to a proposal for change are, at best, 
poorly understood. Even when the group 
reaction can be forecast, it is difficult to 
predict on which side of the controversy a 
given individual will be found. When the 
issue is one on which members of a neighbor- 
hood group split sharply and feel strongly, 
and when it is related to as controversial an 
issue, nationally, as the development of public 
low-rent housing units, it seems profitable to 
examine the neighborhood reaction in some 
detail. 

In the spring of 1950, in a large midwestern 
city; plans were announced for the erection of 
a public low-rent housing project, to be built 
with federal funds on a site flanked on one 
side by industry, and on the other by a fairly 
close-knit community somewhat above aver- 
age, economically. The immediate reaction to 
the announcement came from the more civic 
minded members of the community who 
attended the annual meeting of the neighbor- 
hood association at which the Director of the 
local housing authority described the project 
and answered innumerable questions. At the 
conclusion of this two-hour presentation- 
discussion, the association with 200 in attend- 
ance voted overwhelmingly in favor of the 
project with a scattered minority of “No” 
votes. 

There developed, soon after this meeting, 
however, a movement of opposition to the 
project as its implications (or possible implica- 
tions) for the neighborhood were discussed. 
A series of small meetings of opponents and a 
larger meeting, mostly of proponents, was 
climaxed in a hearing before the city Council, 
which formally approved the project and the 
site. The survey of opinions of neighbors, 
summarized in this article, was made shortly 

* The writers are indebted to the Minneapolis Star 
and Tribune under a grant to the Research Division of 
the School of Journalism, and to the Graduate School 
of the University of Minnesota for financial support of 
the project described herein, and to Messrs. Wesley G. 


Booman and Sherwood Benson for their field super- 
vision of the interviewers used in this project. 


after this decision. It should be mentioned 
that the formal action of the city fathers did 
not have the effect of making this matter a 
dead issue in the community! 


The Survey Procedure 


A 44-item questionnaire was developed 
covering a variety of aspects of opinions toward 
the projected housing development. Ques- 
tions covered such topics as information about 
the nature of the specific development, at- 
titudes favorable to or opposed to the develop- 
ment, estimates of the effect of the develop- 
ment on such things as property values, 
taxes, crowding in schools, noise, traffic 
problems, and attitudes towards the pros- 
pective tenants to be housed in the develop- 
ment. In addition, questions were asked 
about the participation of each respondent in 
community meetings about the project.! 

This questionnaire was pretested on a fixed 
address sample of twenty-five respondents, 
and was, after revision, used by thirteen 
professional interviewers with another fixed 
address sample of 196 dwelling units.2 These 
units were selected after listing all dwelling 
units in the neighborhood by using the address 
section of the city directory. For the imme- 
diate vicinity of the proposed housing develop- 
ment, every other dwelling unit was selected. 
For the remainder of the area, every sixth 
dwelling unit was selected. 

Interviewers were instructed to interview 
a responsible adult in each of the dwelling 
units assigned to them, to attempt to obtain 
a fifty-fifty split between men and women, and, 

1 A copy of the questionnaire, with responses to each 
question broken against over-all opinion toward the 
development, has been deposited with the American 
Documentation Institute. Order Document No. 3140 
from the American Documentation Institute, 1719 N 
Street, N.W., Washington 6, D. C., remitting $1.00 
for microfilm (images 1 inch high on standard 35 mm. 
motion picture film) or $1.00 for photocopies (6 X 8 
inches) readable without optical aid. 

* The Parten method of fixed addresses to insure a 
bona fide probability sample was followed. See Mildred 


B. Parten, Surveys, polls, and samples, New York: 
Harpers, 1950. 
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in the event no one was at home at a given 
address, to make two call-backs to attempt to 
complete the interview. . Interviewers received 
their instructions on a Monday morning, 
and were to complete their assignments by 
Thursday morning. Additional call-backs 
were made after this date to interview refusals 
and persons still not at home after three calls. 
Final box score on interviewing was: (1) Total 
number of dwelling units listed, 196; (2) 
Number of interviews completed, 188 (95.9 
per cent); and (3) Interviews not completed, 
8 (4.1 per cent). 

Reasons for non-completion of interviews 
were: (1) Vacationing, 4; (2) Refused to be 
interviewed, 1; (3) Respondent never found 
at home, 1; (4) Newly-arrived summer 
tenant, 1; and (5) Householder in hospital, 1. 
Total N, therefore, in the remainder of this 
report, will be 188. 

Checks to verify interviewing were made by 
telephoning every sixth respondent. 


Results 


The degree of conflict generated in the 
community by the issue of low-rent housing 
in the area is indicated by the responses ob- 
tained to the two questions: “Do yeu favor 
or oppose the construction of this new develop- 


Table 1 
Opinions Toward Low-Rent Housing Project 








No 
Opinion 
OF 
Oppose Qualified 
Total Group: Number 73 71 44 
PerCent 39% 38% 23% 
By Intensity of Feeling: 
Very strongly 42% 49% 0% 
Rather strongly 33 39 0 
Not strongly at all 24 12 39 
No answer 1 0 61 


100% 100% 100% 


Favor 








ment?” and “How strongly do you feel about 
this?” Responses are shown in Table 1. 

The percentages in Table 1 indicate the 
closeness of the division between the propo- 
nents and opponents of the development, and | 
the fact that the views of both groups were 
held strongly. Rather surprisingly, in view of 
widespread and repeated newspaper publicity, 
group meetings on the issue, and innumerable 
private discussions, almost one-fourth (23 per 
cent) had not yet taken a definite position at 
the time of the survey. 

Another comparison of interes‘, is produced 
by splitting the sample into two groups, one 


Table 2 
Percentage Distribution of Opinions Toward Low-Rent Housing Project Classified by Location 


and by Intensity of Feeling 








Favor 


No Opinion 


Oppose or Qualified 





By Location: 
Near Group (N = 67) 
Far Group (N = 121) 

By Location and Intensity: 
Near Group: 

Very strongly 
Strongly 


36% 


Not strongly at all 
Total 


Far Group: 
Very strongly 
Strongly 
Not strongly at all 


Total 





45% 19% 
34 26 


47% 0% 
40 0 
100 


100 


0% 
0 
100 


100 
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Table 3 
Reported Incomes of Respondents 








Income 





$6,000 per year or more 20% 
$5,000 to $5,999 per year 13 
$4,000 to $4,999 per year 20 
$3,000 to $3,999 per year 19 
$2,000 to $2,999 per year 16 
$1,000 to $1,999 per year 7 
Oto $999 per year 2 1 
No answer 4 
100% 


Total 188 





made up of persons living within about two 
blocks of the boundaries of the projected 
development, the “Near” group, and the 
other made up of the remainder of the sample, 
the “Far” group. The Near group included 
67 of the 188 respondents. The responses 
of the two groups to the two questions on 
opinion toward the development and the 
intensity of their feeling, are shown in Table 2. 

The results in Table 2 were unexpected. 
In the first place, there is very little difference 
between the near and far groups on the propor- 
tion favoring or opposing the project. In the 
second place, both near and far groups hold 
their opinions in favor of or in opposition to 
the project with about-equal intensity. In 
this particular situation, we may safely 
conclude that nearness to the proposed 
development is not an important factor in 
determining pro or con attitudes. 


variety of occupational groups, and hence 
differs considerably within itself in socio- 
economic status, one question on income was 
asked of all respondents. Reported incomes 
_ are shown in Table 3. 

Income varies from less than $1,000 per 
» year to more than $6,000. The median income 


ale nee ee APE TE: TE IS 


<p 


Per Cent 


Since this community represents a wide 


is slightly more than $4,000 per year which 
exceeds by only $1,600 the maximum of 
$2,400 permitted for those eligible to reside in 
the housing project. Approximately 16 per 
cent of the respondents report incomes below 
this maximum. One might expect therefore 
that low income respondents would favor the 
project whereas high income respondents would 
oppose the project. To test this hypothesis, 
the respondents were grouped into three 
income classes: $5,000 and over; $3,000 to 
$4,999; and below $3,000 per year, and the 
opinions of these three groups toward the 
project tallied. The results are shown in 
Table 4. 

Income level, surprisingly enough, is shown 
not to be one of the primary factors in deter- 
mining attitudes toward this housing develop- 
ment. The highest income group is actually 
more in favor of this project although the 
difference is so slight as to be of little or no 
importance. Additional analyses, not pre- 
sented here, indicate that intensity of feeling 
is also unrelated to differences in income levels. 

A further breakdown of the percentages 
already presented is possible if one uses the 
response to the question, “Have you attended 
any meetings to discuss this development?” 
This breakdown, shown in Table 5, reveals 
an interesting relationship between attendance 
at meetings and intensity of feeling. It is 
apparent, from Table 5, that those persons who 
attended meetings (this question was appar- 
ently interpreted by all respondents to refer 
to two scheduled large public meetings at 
which housing authorities and others spoke), 
uniformly felt strongly on the housing issue, 
while those who did not attend varied con- 
siderably in their intensity of feeling. It is 
not possible to determine whether attendance 
at the meetings had any effect upon the pro 
or con attitude toward the housing develop- 
ment, but the data show that no one left 


; Table 4 


Opinions Toward Low-Rent Housing Project According to Income Level 








Income Level N 


Qualified or 


Favor Oppose No Opinion 





$5,000 per year and up 62 
$3,000 to $4,999 per year 74 
Less than $3,000 per year 45 


44% 21% 
39 35 26 
35 43 22 
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Table 5 


Opinions Toward Low-Rent Housing Project According to Attendance or 
Non-Attendance at Public Meetings 








Attended Meeting 


Attended No Meeting 





Favor Oppose 


No Opinion 
or Qualified 





No Opinion 


Favor Oppose or Qualified 





Total Group: Number 16 8 


By Intensity of Feeling: 
Very strongly 88% 87% 
Rather strongly 12 13 
Not strongly at all 0 0 
No answer 0 0 


100% 100% 


0 57 63 44 


30% 44% 0% 
39 43 0 
30 13 39 

1 0 61 


100% 100% 100% 





those meetings without formulating an opinion. 
The results on this point are clear-cut even 
though based on a total of only 24 respondents. 
Furthermore, if only the meeting-goers in this 
sample were used, the vote for the housing 
development would not be a near fifty-fifty 
split, but would be a two-thirds majority in 
favor of the development (16 out of the 24 
persons attending meetings). 

Since the greatest differences in responses 
on the questionnaire were found between the 
small group of persons who had attended 
meetings and those who had not, it is important 


to examine the responses of these two groups 
to a few of the other questions in the question- 
naire in order to disclose, if possible, the 
determinants of attitude in this particular 
situation. The questions themselves are 
grouped below into those dealing with “pre- 
dictions of things to come” and with questions 
requiring knowledge of facts about the project. 


Questions Involving Prediction 


“Do you think your property values will 
go up, go down, or stay about the same as a 
result of this housing development?” 














“Go Up” “Go Down” “Stay Same” 





Total Group 
Favor Project: 
Attended meeting 
Attended no meeting 
Oppose Project: 
Attended meeting 
Attended no meeting 
No opinion or qualified opinion on project 





35% 50% 


0 88 
9 74 


0 
72 17 
18 61 19 





“Do you think that this unit will bring undesirable people into the neighborhood and 


schools?” 





“Ves” 





Total Group 
Favor Project: 
Attended meeting 
Attended no meeting 
Oppose Project: 
Attended meeting 
Attended no meeting 
No opinion or qualified opinion on project 
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“Will the construction of this development have any effect on your long term plans to 


stay in this neighborhood or move out of it?” 








N 


“No” 





Total Group 
Favor Project: 
Attended meeting 
Attended no meeting 
Oppose Project: 
Attended meeting 
Attended no meeting 
No opinion or qualified opinion on project 


O% 


75 
89 


0 
32 
84 


54 
it 








Responses to the above three questions 
reveal only small differences between those who 
attended meetings and those who did not attend 
meetings. But large differences are disclosed 
between proponents and opponents. It would 
thus appear that opposition to the project is 
based upon strong feelings that such a housing 
project will bring an undesirable element into 





5. “If an undesirable family gets into this 
development will the housing authority 
be able to get them out?’ Correct 
answer was “Yes.” 


The percentages of correct responses for 
each subgroup are reported by question 
number in the following tabulation: 





N 


No. 1 No. No. 4 





Total Group 
Favor Project: 
Attended meeting 
Attended no meeting 
Oppose Project: 
Attended meeting 
Attended no meeting 
No opinion or qualified opinion on project 


188 


16 
57 


8 
63 
44 


30% 12% 15% 
69 


21 


63 
14 


87 
25 
23 


62 
27 





the neighborhood which in turn will destroy 
property values. 

The following questions were asked at differ- 
ent times during the interview to get some 
estimate of the amount of information pos- 
sessed by each respondent: 


1. “About how many families do you under- 
stand will be housed in this develop- 
ment?” Correct answer was ‘120 
families.” 

. “How much a month will be charged for 
rent, heat and utilities for one of these 
units?” Correct answer was “about $36 
per month.” 

3. “How many stories will the units have?” 
Correct answer was ‘‘two stories.” 

. “What is the most money a family can 
make a year and still rent a place in 
the development?” Correct answer was 
“$2,400 per year.” 


It is apparent that these questions which 
require knowledge for a correct answer reveal 
large differences in the responses of those who 
attended meetings in contrast to those who 
did not attend meetings, except for Question 5, 
but only small differences between proponents 
and opponents of the development emerge. 
The correct answers to Questions 1, 2, 3, and 
4 were given at the first general meeting and 
in the newspaper articles but the correct 
answer to question 5 was given only at the 
second general meeting attended chiefly by 
proponents. 

The evidence is clear-cut that attendance at 
meetings where one-hour presentations were 
made including visual aids followed by an 
hour-long question and answer period was 
more effective as a means of informing the 
neighborhood than were the series of articles 
and editorials that had appeared piecemeal 
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in the daily press for a period of several 
months. 


Summary and Discussion 


When neighbors of a proposed low-rent 
housing development were queried to learn 
their reactions to the project, it was found: 


1. About as many neighbors favored as 
opposed the project, with a rather substantial 
proportion (23%) having no opinion or giving 
a qualified response. 

2. The intensity with which these opinions 
were held was greatest for those persons who 
had made up their minds, and especially for 
those who had attended meetings to discuss 
the nature of the project and implications for 
the neighborhood. The opponents of the 
project held their opinions somewhat more 
strongly than did the proponents. 

3. Persons who felt strongly and who 
attended meetings to discuss the project were 
much better informed on matters of fact about 
the project, than were persons feeling less 
strongly and not attending meetings. This 
result was found in spite of adequate coverage 
of the controversy on the project by the local 
press and radio. 

4. Persons who favored the project differed 
markedly from those who opposed in their 
expectations of the influence of the project on 
the future development of the neighborhood, 
on schools, on property values, taxes, and the 
like. 


These observations have definite implica- 
tions for officials responsible for the intro- 
duction of public low-rent housing into 


established communities. The importance of 
adequate provision for essential and correct 
information cannot be emphasized too much, 
in the light of the very poor responses of 
persons in this sample, especially the responses 
of those persons who apparently considered 
themselves well enough informed to make 
up their minds on the issue. This lack of 
information is all the more discouraging in 
view of the fact that a large percentage of the 
residents in the area are members of the staff 
of a nearby university, and are, presumably, 
well-read individuals. 

This study does not provide any basis for 
planning the details of the sort of informational 
campaign apparently required to educate a 
community on such a controversial subject. 
It does suggest, however, that the ordinary 
mass media are not adepuate for the job, 
except, possibly, for those persons with intense 
feelings on the issue. It would appear that 
a number of community meetings is desirable, 
not only for providing correct information but 
for ‘mobilizing opinion. The local housing 
authorities involved in the controversy de- 
scribed in this paper could probably have 
reduced the amount of disturbance by conduct- 
ing a number of well-publicized meetings with 
members of the community, by anticipating 
the negative reactions of some of the residents, 
and by providing, perhaps through attractive 
pamphlets, or the like, the information needed 
to reduce the anticipation of disastrous results 
on the neighborhood that might arise from a 
low-rent public housing project. 


Received November 9, 1950. 
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The Coefficient of Agreement in Determining Product Preferences 


Benjamin Balinsky and Milton L. Blum 
The City College of New York 


and 


Solomon Dutka 
Dun and Bradstreet 


The coefficient of agreement is a statistical 
technique that can be used advantageously in 
the analysis of data obtained from many 
different types of research problems in psy- 
chology. To the best of our knowledge, it has 
not been previously applied in psychological 
publications. The object of this paper is to 
familiarize the reader with the technique and 
illustrate its use. 

The coefficient of agreement is most appro- 
priately used in conjunction with the method 
of paired comparisons. It enables the exper- 
imenter .to measure the degree and test the 
significance of agreement among observers as 
to their preferences for a series of items offered 
for consideration. Unlike the method of 
ranks, this technique does not depend upon 
any assumptions of “linearity” in the variate 
being measured. Because of this fact, the 
method is advisable when subjective items 
such as preferences for advertisements, tastes 
in music, preferences for food, clcthes, and so 
on are measured. It can also be readily used 
in the construction and testing of attitude 
and opinion scales. 

In this report the preferences of a jury of 
observers for four types of ice cream were 
determined by the paired comparisons method 
and the results tested by using the coefficient 
of agreement. 

The method of “paired comparisons” is a 
general method for investigating preferences or 
rankings by individuals. It consists of offering 
for judgment to an observer or observers all 
possible pairs of items that are to be considered 
according to the different degrees in which they 
exhibit some common quality. Whenever this 
quality is measurable in an objective fashion, 
these items will yield a number of variate 
values whose distribution can usually be 
readily treated. However, it often happens 
that the common quality is not objectively 
measurable according to a known linear scale 


and we have to rely on the subjective judg- 
ments or preferences among observers for 
comparisons. 

The method of ranks has been most widely 
used for this type of comparison. It consists 
of having an observer or observers arrange a 
series of items, A, B, C, . . . etc. in the order 
in which they appear to possess different 
degrees of the common quality under examina- 
tion. Such a treatment, however, is invalid 
whenever the quality considered does not 
lend itself properly to a ranking treatment. 
For example, to require a group of observers 
to rank a series of magazine advertisements 
according to their preference begs the question 
as to whether they can unbiasedly make such 
a choice. Obviously, if they rank A over B 
and B over C it forces the assumption that A 
is likewise preferred to C. However, for 
subjective preferences in advertisements, foods 
or other such items, each of which is really a 
multi-variat: concept, it is not entirely 
impossible that an observer will prefer A to B, 
B to C, and C over A. Not only is it possible 
for such an “inconsistency” to occur, but any 
ranking method which fails to consider this 
possibility may yield. distorted picture of the 
“true’’ preference pattern and the variation of 
the common quality among the items con- 
sidered. 


The Coefficient of Agreement 


By using the method of paired comparisons 
to obtain data, such “inconsistencies” as the 
preference of A over B, B over C, and C over 
A can be considered statistically by means of 
the coefficient of agreement. 

In using the coefficient of agreement in 
conjunction with the method of paired com- 
parisons, each of m observers is asked to 
express his preference, if any, among each of 


This will 


. . ° . 
the 2 pairs of m items in question. 
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yield a matrix of m (m — 1) cells indicating the 
results of these choices. Thus, if in the 
“paired” comparison of items A and B, item A 
is preferred to B we enter a unit in row A 
column B. Were B preferred to A we would 
similarly enter a unit in row B column A. 
If there is no indication of a preference for 
either item the unit is divided equally between 
each of the two cells. 

. If all observers were in complete agreement 
as to their preferences for the series of items 
in question there would be : cells containing 


the number m (i.e., the total number of ob- 


¢ : n 
servers) on one side of the diagonal and 2 cells 


with no frequencies on the other side. A 
measure of the degree of agreement is given 
by M. G. Kendall and B. Babington Smith! as 


“BO” 


= coefficient of agreement 
sum of the number of agreements 
between pairs of judges 
= the number of observers 
n = the number of ite1as 


If there is complete agreement » has a 
maximum value of 1. It has a minimum value 


—1 1 
of (form even) and — — (form odd). 
m—1 m 


Its minimum value is — lonlyform = 2. This 
is to be expected since there cannot be complete 
disagreement among three or more observers 
in paired comparisons. ‘The coefficient can be 
tested for significance by comparing its value 
to the value which could have arisen by chance 
if the preferences had been assigned at random 
with respect to the quality being considered. 
Distributions for various values and combina- 
tions of m and m have already been worked 
out (1). In practice, for higher values of m 
and m a close approximation is afforded by 

1 Kendall, M. G., and Smith, B. Babington. “On 


the Method of Paired Comparisons,” Biometrika, 1940, 
31, 324-345. 
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the x? distribution with v degrees of freedom. 


P m — 3 

erty m—2 

n\ m (m — vp 
~ \27 (m — 2) 

A practical but not statistical limitation of 
the paired comparison technique is its un- 
wieldiness when too many items are to be 
compared. Since the number of paired com- 
parisons offered to the observer increases 
disproportionately as the number of items 
offered for judgment increases, there is a 
practical limitation to the number of paired 


pre * to be made, e.g., five items would 


n (n — 1) 


yield or 10 comparisons, ten items 


would souilasty yield 45. 


The Ice Cream Preference Test 


Manufacturers of ice cream differentiate 
between two types: French and Domestic. 
The former has an egg content while the latter 
does not. In addition, each of these types of 
ice creams can differ in butter fat content. 

The problem posed by the manufacturer was 
whether a “jury” has preferences for ice 
creams differing in the stated characteristics 
and if so, whether a significant order of 
preferences exists. ° 

Four kinds of ice cream were served: two 
kinds of French ice cream, one with high and 
the other with low butter fat content, and a 
high and low butter fat Domestic. While the 
exact proportion of butter fat content is a 
“manufacturer’s secret,”’ it was arranged that 
the low butter fat French be equal to the 
high butter fat Domestic. 


All the ice cream served was vanilla since it 
was believed that flavor had to be held constant 
for our purposes. The subjects were not told 
how many kinds of ice cream were being served. 
The size of the portions was held relatively 
constant as were the temperature and solidity 
of the ice creams used. 

One hundred students served as subjects 
partly because of their willingness and also 
because of their convenience for the experi- 
menters. Each of these subjects was presented 
with two portions of ice cream in ordered 
sequence at any one time. Separate plates 
and spoons were used throughout the experi- 
ment. The subjects were requested to taste 


comer 


ex tetera PRR! Leip tog 
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the ice cream in the order presented and were 
told they could eat all or part of the portions 
served. They were encouraged to reach a de- 
cision and to state their preferences for either 
ice cream A or B. 

It was necessary to have six trials, since the 
paired comparison method was used and four 
ice creams were to be compared with each 
other. It was then possible to have each sub- 
ject state his preference for each ice cream 
when compared with every other. Actually 
he tasted each ice cream three times in the 
experiment. 

Position and serial presentation were neu- 
tralized by the experimental design. The 100 
subjects were divided into eight groups and 
each group tasted six pairs of ice cream in 
different positions and serial order. In the 
design, any one ice cream was presented either 
first or second in a pair an equal number of 
times. This was done to eliminate preference 
due to position in the trials. In addition each 
ice cream was presented in different serial order 
on successive trials so that each ice cream had 
an average serial order of approximately 3.5— 
i.e., in the middle of the series. 

After the eating experiment took place, each 
subject was handed a questionnaire to deter- 
mine his knowledge about differences in ice 
cream and also to determine what relationship 
existed between verbalizations and actual taste 
preferences. 


Results 


To test the significance of the order of 
preferences the coefficient of agreement was 
employed. The matrix and calculations are 
given below. seis 

The matrix in Table 1 may be read as 
follows: 46.5 students preferred French low 
butter fat over French high butter fat and, 
conversely, 53.5 students preferred French high 
butter fat. The fraction results from assign- 
ing those with no preferences evenly to each 
category. Also, 61.0 preferred French low 
' butter fat over Domestic low butter fat and, 
) conversely, 39.0 preferred the Domestic low 
| butter fat. 





Balinsky, Blum, and Dutka 


Substituting the values in Table 1 in the 
preceding formulae we find: 


b= sao! 


OO 


2=* 
u 
2 


x? = 63.7, 0 = 6.2 
P < .001 


The order of prefergnce is shown in Table 1. 
The degree of agreement among the observers 
as to the order of preference is significant. 

Other tests of significance show that the 
low butter fat French is preferred over each of 
the Domestics and the high butter fat French 
is preferred over the high butter fat Domestic. 
There is no clear cut difference between the 
two French ice creams. 

The results also clearly demonstrate that the 
indicated preferences cannot be interpreted as 
caused by either position or serial order. 
Regardless of the ice cream presented, position 
1 was chosen 45.4% of the time; position 2 
was chosen 40.2% of the time and no preference 
was indicated 14.5% of the time. (In all 600 
choices were possible.) Tests for significance 
indicate that position had no significant effect 
on preference. 

It was assumed that serial order might have 
an effect on the ability of the subjects to make 
a preference and so this factor was controlled 
experimentally. Too much ice cream might 
make selection more difficult or as one subject 
at the end of six trials stated, ““My taste buds 
are frozen.”” In comparing the choices of the 
four pairs of the first trial in the serial order 

*> may be transformed as follows for computational 
purposes: 2 = E(y)* — mZ(y) + 3 


summation of the individual scores (y) now takes place 
over the half of the matrix below the diagonal. 


where the 


= 
2 ’ 


Table 1 


Matrix of Preferences Among Various Ice Creams 








Type of Ice Cream A 


B Cc 





. French low butter fat _ 

. French high butter fat 53.5 
>. Domestic low butter fat 39.0 

. Domestic high butter fat 23.5 


Total 116.0 





46.5 
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that were identical with the four pairs of the 
last or sixth trial, we find that the preference 
for the specific ice cream was always in the 
same direction, and such differences as occurred 
were due to chance. 

Analysis of the questionnaire indicates that 
51 per cent of the group stated a preference for 
French ice cream, 22 per cent for Domestic, 
and 27 per cent stated no preference. How- 
ever, all three groups in taste tests actually did 
prefer French ice cream. While a stated 
preference for “French” is positively related 
to results of taste tests, the stated preference 
for “Domestic” or “no preference” is not so 
related. 

Approximately one half or 52 out of 100 of 
the group stated an awareness that ice cream 
differs in butter fat content. When asked to 
estimate the amount of butter fat in ice cream, 
those who stated an awareness overestimated 
the amount. The estimated range of butter 
fat varied from 4 of 1 per cent to 90 per cent. 
The average minimum amount of butter fat 
estimated was 15%; the average maximum 
amount of butter fat estimated was 31%. 
Both minimum and maximum estimates were 
unrealistically high. This would indicate that 
consumers are not informed about this char- 
acteristic of ice cream. 

The qualitative estimate of difference in 
taste was that butter fat produced a “‘creamier” 
ice cream and of those who stated a preference, 
most claimed they preferred ice cream with 
high rather than low butter fat. 

Another aspect of qualitative results was 
obtained upon analysis of the reasons for one’s 
preference for a specific ice cream. Here the 
results are not clear cut at all. Most often 
the same reason was given for the preference 
regardless of the ice cream chosen. For 
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example, in one comparison, the majority 
reason for preference was “tastier” and this 
applied to both ice creams in the pair. The 
most frequent reason why ice cream was 
preferred was verbalized as “creamier,” and 
this reason is given overwhelmingly whenever 
French is compared with Domestic. 

High butter fat was reported as being 
“creamier.” When butter fat was held con- 
stant in the comparison of the French and 
Domestic ice creams, “‘creamier” was still 
given as the reason. Apparently “creamier”’ 
is a good descriptive word for ice cream. 


Summary 


The method of paired comparisons is 
preferred to the method of ranks when sub- 
jective considerations make the assumption of 
“linearity”’ of the variate an untenable one. 
The coefficient of agreement was described as 
a statistical refinement to measure the degree 
of agreement of preferences among observers 
and to test its significance. 

The use of this method was illustrated in a 
controlled experiment to determine the prefer- 
ences of 100 subjects for varieties of ice cream. 

The order of preferences obtained was 
significant. Other tests of significance in- 
dicated that the low butter fat French ice 
cream was preferred over the domestics and the 
high butter fat French over the high butter 
fat Domestic. 

Qualitative analysis of a questionnaire 
showed statements in favor of the French ice 
cream, overestimation of the butter fat 
content in ice cream, and the description of 
“creamier” most frequently given to the 
preferred ice cream. 


Received November 13, 1950. 








Television Owning and Its Correlates * 


Charles E. Swanson and Robert L. Jones 
University of Minnesota 


Much attention has focused recently on 
what television is “doing” to American 
audiences. Both academic and commercial 
research agencies have studied this new 
medium. Survey results, commentary, spec- 
ulation, and personal anecdotes about tele- 
vision have appeared in such widely varying 
outlets as Tide (24), Commonweal (19), 
Nation’s Business (12), The Saturday Review of 
Literature (3), Architectual Forum (23), and the 
Journal of Applied Psychology (2). 

Most emphasis has centered on three areas: 
advertising impact of television (15, 25), 
influence of television set ownership on use of 
other media (2, 13, 20, 26), and how television 
affects family relationships and cultural stand- 
ards (1, 3, 6, 19). 

This study adds data on the relations of 
television set ownership to use of other media 
and to educational and income levels. It also 
is the first panel study to report differences 
between television set owners and non-owners 
in: 1. reading certain categories of news 
content; 2. extent of certain news interests; 
3. levels of information about governmental 
affairs; and 4. intelligence levels. 


Sample 


The primary purpose of this panel investiga- 
tion was to discover predictors of information 
levels in the area of governmental affairs. 
Four waves of personal interviews averaging 
about an hour each in length were conducted 
with a probability sample of Minneapolis 
adults in April and May, 1950. A wide 
variety of measures, self-estimates, and ratings 
was collected and then intercorrelated with 
each other and with the criterion tests. 


The panel design was similar to that used by 
Lazarsfeld, Berelson, and Gaudet (11). The 
sample followed an area probability design 


* Publication No. 2 under communications research 
at the University of Minnesota, sponsored by the Office 
of Naval Research (Contract N6ONR-246, T.O. 4) ina 
grant to Research Division, School of Journalism, Uni- 
versity of Minnesota, under direction of Dr. Swanson. 
The project is entitled “Analysis of Factors Producing 
Individual Differences in Reading and Listening Habits 
and In Development of Information Levels.” 


described by Jessen, Blythe, Kempthorne, and 
Deming (8, 4). 

Because the study required that a measure of 
newspaper reading behavior be made within 
36 hours after the newspaper's publication, 
heavy mortality was expected. A base sample 
of 448 dwelling units was required for the first 
wave. These dwelling units were selected from 
106 blocks, obtained by random selection from 
the 1940 census block statistics for Minne- 
apolis (21). 

Representativeness of the 106 blocks was 
tested by projection to the total dwelling units 
and monthly rentals reported in the 1940 
census. Preliminary 1950 census data on total 
dwelling units also were used as criteria. These 
106 blocks provided an estimate of the 1940 
Minneapolis dwelling units accurate within 1} 
per cent. The number of dwelling units in 
Minneapolis in 1950 was estimated within 3 
per cent. By chi square, a probability of .93 
was obtained that differences between the 106 
blocks and 1940 Minneapolis monthly rentals 
were due to chance errors of random sampling. 

A census of the 106 blocks was made and the 
448 dwelling units selected at random. Inter- 
viewers called at the pre-designated dwelling 
units. An adult in each dwelling unit was 
selected at random by Kish’s method (9) 
and newspaper readership interviews with 350 
adults were completed on the first wave. 

Television-owning households and non-own- 
ing households were identified on the panel’s 
second wave. Since these households were 
selected for the panel from all Minneapolis 
dwelling units on a random basis, the propor- 
tion of television-owning homes and the char- 
acteristics of members of those households 
should be representative of television-owning 
homes in the city. 

A total of 31 television-owning households 
and 171 non-owning households was found in 
the surviving sample. Converting these to 
percentages, it appears that about 15 per cent 
of Minneapolis homes owned TV sets as of 
April, 1950. 

Since a probability sampling design was em- 
ployed, no attempt was made to match owners 
and non-owners on personal or social charac- 
teristics. The number of cases in each of the 
groups was not equal, but was in proportion 
to the number of sets in the population studied. 


Method 


Identical questions, tests, and stimulus 
materials were used to get data from respon- 
dents in TV-owning households and in non-TV- 
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owning households, so that group comparisons 
might be made. 


On the first wave of the panel, basic personal 
and demographic data on the respondents were 
collected and a readership interview was con- 
ducted on selected items in the April 19th 
Minneapolis Star. Two categories of content, 
governmental affairs news and comics, com- 
prised the selected items for the readership 
questions. 

On the second wave, respondents were given 
two tests of information in the governmental 
affairs area. A series of attitude questions was 
asked in this interview, and a media usage 
questionnaire was administered. Finally, re- 
spondents were given the Goossen Disguised 
Intelligence Test (5) which yields an estimate 
of a person’s intellectual level. 

The short form of the Goossen Test used in 
this study is composed of five items which 
survived rigorous item analysis. Fifty-five 
items were selected originally from such sources 
as the Revised Stanford Binet, the Wechsler- 
Bellvue Scale, and the Cooperative Test Serv- 
ice Tests in Social Studies and American His- 
tory. .Some items also were devised by the 
test author. All items were required to meet 
standards of normal language level, content 
within the experience of normal adults, phras- 
ing similar to public opinion poll questions, 
ease of scoring by relatively untrained persons, 
and permanence of item interest. As reported 
by Goossen, the Hoyt reliability of the five- 
item test is .63, and its validity, obtained by 
correlating it with the Pressey Classification 
Test, is .76. When years of education is com- 
bined with Goossen Test score, the multiple 
correlation with the Pressey Classification Test 
becomes .88. 

The Goossen Test differs markedly from the 
vocabulary test developed by Thorndike (17, 
18) for use in estimating intelligence of opinion 
poll respondents. Thorndike’s test was com- 
posed of single-word items and had a multiple 
choice answer arrangement. All but one of 
the Goossen items are open-ended questions in 
which the respondent must structure his own 
response. In Goossen’s opinion, the format of 
the Thorndike test makes it difficult to disguise 
it, and to fit it smoothly into a context of opinion 
questions. 

On the third interview another readership 
was conducted, this time on selected items in 
the May ist Star. On the fourth wave, re- 
spondents estimated their interest in certain 
kinds of news and comic materials presented to 
them in a booklet. 

The methodology of this study resembles the 
“‘after-only”’ methods used in some investiga- 
tions of mass communications effects in World 
War II (7). No “‘before” measures were avail- 
able for the variables under scrutiny for either 
the owning or non-owning groups. It is best 
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under these methodological conditions to speak 
of correlates of TV ownership rather than effects. 


Status of Television at Time of Study 


At the time this study was conducted, two 
television outlets were available to set owners 
in the Twin Cities. KSTP-TV had been in 
operation for 24 months and was telecasting 
about 7 hours per day. WTCN-TV had been 
in operation for 10 months and also was 
telecasting about 7 hours per day. 

In April and May, 1950, both these outlets 
were operating as local stations without “‘live” 
networks shows. A coaxial cable carrying 
network television to the Twin Cities was 
not completed until October 1, 1950. 

The average length of set ownership in our 
sample was just over 5 months, another 
indication that the use of this medium in the 
Twin Cities area was in its infancy. Michael 
and Jordan (13) in a study done in a large 
eastern city were able to analyze television 
effects after the novelty of the set had worn 
off by limiting their sample to persons who had 
their sets for more than a year. (Some of 
their respondents had owned sets for nine 
years.) 


Results 


Television Ownership and Personal Data. 
A frequently debated point regarding television 
set ownership is the matter of the economic 
status of set owners. Do the wealthier 
persons in a given area tend to own a dis- 
proportionate share of the sets? 

Several findings are available which indicate 
that no pronounced economic stratification of 
set owners has taken place. Tide reports 
(24) that in New York people in C and D 
economic groups appear to be buying TV sets 
in the same relative volume that they buy 
other appliances. ‘“‘People,” says Tide, “will 
soon think of it in ‘necessity’ rather than in 
‘luxury’ terms. Like the . . . mechanical re- 
frigerator and the radio before it, the TV set 
is fast becoming an installment-plan purchase.”’ 

Another report based on New York data 
(22) indicates that ‘‘modest” income families 
own the greatest share of television receivers. 

In the Minneapolis panel, as Tables 1 and 
2 indicate, no significant relationship was 
found between income level and ownership of 
a TV set by either the biserial or the t-test 
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Table 1 


Mean and Variance Significance Tests for Television vs. Non-Television Owners on Personal Data 
Note: Non-TV-Owners N = 171; TV-Owners N = 31. 








Mean 


Mean 
Variables TV Owners Non-Owners 


S.D. 
TV Owners 


S.D. 
Non-Owners F 





Inllteigence 50.3 
(Goossen Test) 

Social Activity 2.0 2.1 

Education in Years 11.7 11.6 

Income 4,180.0 3,680.0 


53.4 


24.22 12.77 3.50* 
1.41 
3.01 

1,624.19 


1.74 30 
3.39 14 
1,673.02 1.56 





* Significant beyond 1% level. 
+ The Behrens-Fisher test was used in this case where the variances were significantly different. 


approach. A slight trend for TV owners to 
have higher incomes is noticed, however. 

Neither years of education nor estimated 
intelligence level (provided by the Goossen 
test) revealed any owner vs. non-owner 
differences in means. The data do indicate, 
however, that TV owners are significantly 
more variable than non-owners on the intel- 
ligence measure beyond the 2 per cent con- 
fidence level. TV set owners thus represent 
a more heterogeneous , group intellectually 
than non-owners! 

Social activity of the going-out variety is 
slightly but insignificantly lower for set 
owners than for non-owners. No measure of 
frequency of entertaining or social activity of 
the others-coming-in variety, was obtained. 

Television Set Ownership and Use of Other 
Media. Many studies which relate use of 
tele vision to use of other media present findings 
in terms of the per cent of TV owners who say 
they use other media less since purchasing a 
television set. Usually no pre-television meas- 
ures on these same people are available for 
scrutiny. 

In the Minneapolis panel investigation, 
interest was centered chiefly on amount of 


Table 2 
Biserial Correlations Between Television Ownership 
and Personal Data 
Note: Non-TV Owners N = 171; TV Owners N = 31. 








Variable 


Intelligence (Goossen Test) 
Social Activity 

Education 

Income 


Biserial r S.E. rois 


—.07 11 
— .05 ll 
— .02 Al 
+.16 Al 








difference in other media use in TV-owning 
households compared with non-owning house- 
holds. Several techniques were used to check 
the significance of apparent losses. 

Simple percentage changes in amount of 
other-media use by TV-owning respondents, 
using non-TV-owning respondents as a stand- 
ard of comparison, show the results indicated 
in the second column of Table 3. 

When other techniques of analysis such as 
biserial r and the t-test of mean differences 
are used, only a few of the “face” differences 
indicated in that column emerge as significant. 

Tables 3 and 4 summarize these analyses 
of findings of the panel study. The formula 
for the standard error of biserial r presented by 
Peatman (14) was used to test biserial signif- 
icances. 

Table 3 showing the results of F and t tests 
indicates that means both of movie attendance 
and radio listening hours are significantly 
lower for the television-owning group. The 
two groups do not show a significant difference 
on newspaper reading time and none of the 
other media-use variables approach signif- 
icance. 

Table 4, employing biserial r as the quantita- 
tive tool, indicates that use of only one 
medium—motion pictures—shows a significant 
(negative) relationship to ownership of a TV 
set. Two other media, radio hours and book 
reading, barely miss a significant negative 
relationship to set ownership. There seems 
to be very little relationship between owning 
a television set and number of magazines read 
and amount of newspaper reading time. 
Amount of reading in the two selected cat- 
egories of newspaper content showed positive 
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Table 3 


Mean and Variance Significance Tests for Television vs. Non-Television Homes on Media Usage 
and Information and Interest Levels 


Note: TV-Owners N = 31; Non-TV-Owners N = 171 








% Difference Mean 
in Usage, Non- TV 
Own. as Base Owners 


S.D. 
TV 
Owners 


S.D. 
Non- 
Owners 


Mean 
Non- 
Owners 


Media 
Variables 





No. of Movies 
Attended Past 
4 Weeks 


Owners 
56% less 


Radio Hours Owners 


42% less 


No. of Books 
Read (past 
4 weeks) 


Owners 


22% less 
Newspaper 

Reading Time 
(min. per day) 


Owners 
21% less 


No. of Maga- 
zines Read 
Regularly 


Owners 
3% less 


Governmenta! 
Affairs Reading 
Score 


Owners 
9% more 


Owners 
13% more 


Comics Read- 
ing Score 


Information and Interest Variables 
Events Test 
Names Test 
Comics Interest 
Governmental Affairs Interest 
{ 





* Significant at 5 per cent level. 
** Significant at 1 per cent level. 


+ The Behrens-Fisher test was used in this case where the variances were significantly different. 


but insignificant relationship to TV owner- 
ship. 

A comparison of these results for radio 
listening with those obtained in 1949 by 
Michael and Jordan (9) in Philadelphia show 
some pronounced differences. In Philadelphia, 
where television had become well established, 
TV owners listened to the radio 43 per cent 
less than non-owners (2.04 hours per day 
compared with 3.58 hours per day). In 
Minneapolis, TV owners listened only 22 per 
cent less, the comparable hour figures being 
2.40 for owners and 3.07 for non-owners. This 
radio listening “loss” by TV owners in Phila- 
delphia in 1949 is significantly greater at the 


one per cent level of confidence than the “loss” 
in Minneapolis in 1950. 

Continuing studies of the Minneapolis 
television audience may show the pattern of 
“audience erosion” suffered by movies and 
radio as network television, color TV, and 
improved programming appear. 

Television Set Ownership and Interest and 
Information Levels. To obtain a measure of 
the panel’s interests in governmental affairs 
news content and in comics, a special booklet 
was prepared in which typical governmental 
affairs news ‘items and portions of comic 
strips were duplicated. Respondents were 
requested to page through this booklet and to 
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Table 4 


Biserial Correlations Between Television Ownership 
and Usage of Other Media and Information 
and Interest Levels 


Note: TV Owners N = 31; Non-TV-Owners N = 171. 








Biserial r S.E. this 





Media Variables 
Movie Attendance 
Radio Hours 
Book Reading 
Newspaper Reading Time 
Magazine Reading 
Reading in Category of 
Governmental Affairs +.05 
Reading in Category Comics ~ +.05 
Information and Interest Variables 
Events Test 
Names Test 
Comics Interest 
Governmental Affairs Interest 


—.26* 
—.20 
—.20 
—.14 
— .03 


—.20 
—.15 
+.01 
+.06 





* Significant at 5% level. 


express their interest in each item by selecting 
one of the five Likert-type responses provided. 
A person’s comics and governmental affairs 
interest scores were obtained by summing all 
his responses to the various items. 

It was expected that these measures would 
be somewhat independent of the actual 
reading done by the panel members in these 
categories. This expectation proved correct. 
For governmental affairs the interest vs. 
reading correlation was +.35 and for comics, 
the correlation between interest and reading 
was +.56. 

Information level in the governmental 
afiairs area was determined by two short and 
highly discriminating tests especially con- 
structed to sample knowledge of governmental 
affairs for the time period immediately 
preceding the field work. These tests were 
arranged to be self-administering. They were 
presented to the panel respondents by inter- 
viewers who remained in the background 
after making certain that the respondents 
knew how to proceed in answering them. 

One of these tests, a 17-item names test, 
required the matching of names of persons 
prominent in international, national, state, 
and local governmental affairs with short 
identification sentences. The Hoyt reliability 
of this test was .92 and a corrected split-half 


reliability was .91. A second test, a 12-item 
events test, had a multiple choice format. It 
was less reliable (.65) and less effective in 
discriminating power. 

Table 3 shows results of F and t tests run 
on these variables. No significances emerge, 
although the trend is for TV owners to express 
more interest in governmental affairs and to 
have less information as measured by both 
tests. 

Table 4 shows the biserial r results for these 
variables for TV-owning and non-TV-owning 
households. 

Despite the failure of any of these coefficients 
to meet a 5 per cent confidence level, it is 
interesting to observe that both information 
coefficients are negative, and one of them 
approaches the 5 per cent level quite closely. 


Implications 


Television brings the movie into the home. 
Its content appears to compete most directly 
with movie and radio. It seems to erode 
least the audiences of newspapers and mag- 
azines. 

While television may improve its program- 
ming in the next year or two, directors of 
other media will be studying their own content, 
seeking ways to hold their audiences. 

The following questions were suggested by 
several findings in this study. Answers to 
them would have important implications for 
the mass media. 

How do trends in “audience erosion” relate 
to the subject matter of television, its direct 
competitors—movie and radio, and its indirect 
conipetitors—books, magazines, and news- 
papers? 

When television takes the time an individual 
once gave to other media, does that individual 
learn less about significant events? 

Or will television stimulate book reading and 
interest in significant content as radio has 
done on occasion? 


Summary 


A probability sample of Minneapolis adults 
reported their media habits in four interviews 
in April and May, 1950. These adults also 
submitted to information tests. A disguised 
intelligence test gave an estimate of their 
intellectual. levels. Of this sample of 202 
adults, 15.4 per cent owned television sets. 











Television Owning and Its Correlates 357 


At the time of the interviewing Minneapolis 
had two television stations. These stations 
were using local programs. 

Owning a television set was related to two 
differences in media use that were significant: 
TV-set owners went to fewer movies and said 
they spent less time listening to radio than 
non-owners. 

Trends were observed for TV-set owners to 
read fewer books; their estimates of time 
spent reading newspapers were slightly less 
than non-owners. 

In the number of magazines read, TV-set 
owners and non-owners showed the least 
difference, percentage-wise. 

Two categories of newspaper reading were 
studied on two different days. No owner 
vs. non-owner differences emerged. 

TV-set owners tended to know less about 
current governmental affairs. On two meas- 
ures of information, differences approached 
significance. Owners did not significantly 
differ from non-owners in incomes, education, 
social activity or intelligence. The TV-set 
owners showed wider variability m intelligence 
than non-owners. 


Received November 30, 1950. 
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Transfer of Training in Tracking Performance at Different Target Speeds * 


Robert S. Lincoln and Karl U. Smith 


University of Wisconsin 


Although some data are available’ concern- 
ing the relation between accuracy of visual 
tracking and target velocity, critical studies 
have not been made in regard to the effect of 
training in performance at one target speed 
upon later performance at different target 
speeds. The present investigation has been 
concerned with this transfer problem as one 
aspect of the understanding of the properties 
of stability and reliability of measures of 
tracking performance. 


Apparatus 


A visual tracking apparatus has been con- 
structed for this study which possesses certain 
simplified characteristics in design for labora- 
tory purposes. 

Figure 1 is a schematic diagram of the com- 
ponents of the apparatus. The total tracking 
system may be considered as five linked sys- 
tems: (1) the hand control, (2) the visual 
presentation, (3) the target generator, (4) the 
differential, and (5) the recording mechanism. 

The hand control of the apparatus consists 
of a crank with a radius of 7} inches, which is 
connected to a steel shaft 41 inches long by 
means of a pair of reduction pulleys and a 
rubber belt. A third pulley, located on the 
end of the steel shaft, drives an indicator 
pointer. This pointer makes one complete 
revolution for every 34 revolutions of the hand 
crank. The subject’s task is to keep the indi- 
cator pointer aligned with the target, which is 
attached to a moving disc located just behind 
the pointer. 

Figure 1 also cleurly pictures the make-up 
of the visual presentation. The target itself 
is a strip of metal #g of an inch wide and }§ of 
an inch long, painted white, and mounted on 
the black target disc, 18 inches in diameter. 
The indicator pointer is 7} inches long. Over 
most of its length it is painted black to blend 
in with the background provided by the target 
disc. The tip of the indicator, however, is 
reduced to the dimensions of the target, and 
like the target, is also painted white. Indi- 
cator and target are so arranged that their tips 
just meet when the two are aligned. Parallax 


* This research has been supported by the Research 
— The University of Wisconsin Graduate 


ool. 

1 Hill, H., Gray, F., and Ellson, D.G. Wave-length 
and amplitude characteristics of tracking error curves. 
Engineering Division, Air Material Command, Dayton, 
Ohio, Report No. TSEAA-694-2D, 1947. 


effects were minimized by arranging the indi- 
cator and the target in approximately the same 
plane. 

Within the target generating system there 
are three main elements, a differential analyzer, 
a cam drive, and the target disc. The target 
disc rotates with the output shaft of the ana- 
lyzer. It is the function of the analyzer to 
transform into target movement the force sup- 
plied by the variable-speed motor driving the 
analyzer. Target speed has been specified in 
terms of the number of r.p.m. at which this 
motor is turning. The pattern of movement 
of the target is determined by the shape of 
the cam used. Asa result of this arrangement, 
the velocity and pattern of target movement 
varies according to the slope of the cam. The 
cam-drive function is transmitted to the differ- 
ential analyzer by means of a small wheel on a 
cam-drive arm. A spring keeps this wheel 
against the edge of the cam at all times. The 
cam itself turns on the shaft of a one-r.p.m. 
motor. 

In any tracking device a differential system 
is required in order to permit the continuous 
recording of the differences existing between 
target movement and control adjustments made 
by the subject operator. For this apparatus, 
a simple mechanical-electrical differential con- 
sisting of two selsyns was designed. Mounted 
on the output shaft of the analyzer is a trans- 
mitter selsyn. The target disc is attached to 
the outside casing of this selsyn. Through 
this arrangement, the target is made to follow 
the same pattern of movement as that of the 
outside casing of the transmitter selsyn. The 
shaft, carrying the indicator pointer which the 
tracker controls, is connected to the shaft of 
this selsyn by means of a heavy rubber tube. 
If the individual tracking the target keeps the 
indicator pointer constantly aligned with the 
target, the shaft of the recording selsyn does 
not move, for the shaft of the transmitter 
selsyn is not moving relative to its outer casing. 
If the tracker fails to keep the target and 
indicator aligned, however, the shaft of the 
recording selsyn moves in the direction of the 
error and with a magnitude which corresponds 
to the magnitude of the off- -target distance. 

Error recording is accomplished in two differ- 
ent ways with this apparatus. Both records 
depend upon the movement of the shaft of the 
recording selsyn. Attached to this shaft is a 
writing point which moves over a waxed-paper 
kymographic recorder. The writing point is 
heated with an electrical heating element in 
order to obtain a clear record. A _ second 
measure of accuracy is arranged in terms of an 
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Fic. 1. Schematic diagram of the apparatus. 


Photograph of the task situation for the subject. 
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on-target time indicator. A clock is connected 
into the system in such a way that the number 
of seconds is recorded in which a subject is on 
target during a given trial. This is accom- 
plished through a lever connected to the shaft 
of the recording selsyn for the purpose of trans- 
mitting the shaft movement to a pointer moving 
laterally over a brass strip, in the center of 
which is inserted a narrow plasticinlay. When 
the tracker is on target, the pointer rests on 
the plastic, and when off target, the pointer 
contacts the brass strip. An appropriate elec- 
trical circuit permits the clock to run while 
the pointer is on the plastic inlay, but stops the 
clock when contact is made with the brass 
strip. On-target readings are made to the 
nearest one-hundredth of one second. 

Figure 2 is a photograph of the apparatus 
and shows a subject in tracking position. In 
overall height the apparatus measures 57} 
inches, while its width measures 40 inches. 
Visible on the left side of the picture is the 
clock in its sound-proofed container. During 
actual trial runs, the clock was not visible to 
the subject. 


Experimental Design 


The primary theoretical problem in this 
study is concerned with the specificity of 
psychomotor behavior in relation to learning 
and transfer of learning effects. If individuals 
are trained in tracking performance at different 


target speeds, how is this performance affected 
at a later time when these same individuals 
are required to track at target speeds differing 
from the one on which they were trained? 
To such a question there are two possible 
answers. The effects of training may be 
highly specific in nature. If so, it is to be 
expected that practice at certain target speeds 
would be more effective than practice at 
other speeds in increasing the accuracy of 
performance when new target speeds are 
introduced. On the other hand, the effects of 
training could be of a highly generalized 
nature. In this case the expectation would 


be that practice on any speed would lead to’ 


similar levels of performance at new target 
}speeds. The experimental design of this 
study was developed to test these two possibil- 
ities. 

The arrangement of the experiment also 
provided information concerning the relation 
of tracking accuracy to absolute target 
velocity and the extent to which practice 
operates as a means of increasing accuracy in 
tracking performance. 
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During the first experimental day, each 
of 36 subjects ran five, one-minute trials 
involving the same target pattern and velocity. 
This pattern differed from the one used 
throughout the remainder of the experiment. 
These trials served both as preliminary practice 
and as a means of determining the subject’s 
initial level of tracking performance. On the 
basis of on-target scores obtained in these 
first trials, the subjects were divided into three 
equated groups of 12 subjects each (Groups I, 
II, and III). Experimental days two through 
five were spent in training by the three groups. 
Group I received five, one-minute trials per 
day with the motor driving the target set at 
23 r.p.m., while Group II worked at a speed 
of 30 r.p.m., and Group III at 37 r.p.m. 

The final three days of the experimental 
period made up the transfer test days. During 
this time, the training groups were run for 
five, one-minute trials on each of the three 
days at a different target speed. These three 
speeds corresponded to the three training 
speeds used. In this way, all groups were 
tested for one day on their own training speed, 
and on the other two days at the remaining 
groups’ training speeds. Random assignment 
of the order of performance on the transfer 
speeds eliminated the effects of practice 
within the transfer days. 


Results 


Figure 3 is a plot of the mean on-target 
scores made by the 12 subjects of each group 
on the training days indicated. In this plot, 








Fic. 3. Performance of training groups on 
successive training days. 
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Fic. 4. Performance of training groups when tested 


at transfer speeds. 


mean on-target time is plotted on the ordinate. 
The abscissa indicates the day of training. 
Throughout the four training days, the group 
training at the lowest target speed made the 
highest accuracy scores, while the group train- 
ing at the high target speed made the lowest ac- 
curacy scores. The scores of that group train- 
ing at the medium target speed fell consistently 
between these two extremes. Little overall rise 
in accuracy scores was demonstrated over the 
total training period. The “t” tests indicate, 
however, that the difference between the day 5 
and day 2 mean scores for each of the groups 
is significant at the .01 level. Because of the 
nature of the experimental design, the day 1 
scores are not comparable to the scores made 
on the other training days. 

Scores concerned with the specificity of 
tracking training were obtained during the 
transfer-test days. The transfer-test scores 
are shown ,in Figure 4. In this figure mean 
on-target scores are given on the ordinate and 
the absolute target-speed level, expressed in 
terms of the velocity of the motor drive, is 
plotted on the abscissa. The parameter is 
group training speed. 

According to Figure 4, the group trained at 
the medium target speed made scores superior 
to those of the other two groups on both the 
low and the medium transfer speeds. On the 
‘high transfer speed, however, a_ reversal 
- appeared, and the group trained at the medium 
target speed made a lower mean score than did 
the other two groups. As in the training data, 
a consistent drop in accuracy scores appeared 
to be associated with an increase in target 


speed. 
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An analysis of the variance in the transfer 
data indicated that the effect of the target- 
speed variable was significant at the .001 level. 
The training group variation was not signif- 
icant. This fact implies that practice at a 
given target speed produced no significant 
superiority of one group over the others at all 
target speeds. There was, however, a signif- 
icant interaction of training groups and 
transfer speeds. The medium speed training 
group apparently accounts for this effect, for 
it was superior at the low and medium transfer 
speeds and inferior at the high transfer speed. 


Summary and Conclusions 


1. A visual tracking apparatus of simplified 
design was constructed for the investigation 
of some of the properties of tracking behavior. 

2. In a study designed to test the specificity 
of tracking training in relation to target speed, 
36 subjects were first divided into three groups, 
equated as to initial tracking ability. Each 
group was then trained at a different target 
speed during a period extending through four 
days. Upon the completion of this training, 
the training groups were transferred to tests 

n at the three training speeds, so that all 
groups tracked for one day at each of the 
three speeds. 

3. The resulting data showed significant 
differences between performances at the three 
target speeds, and practice with this apparatus 
appeared to be relatively ineffective as a 
means of increasing accuracy scores. 

4. Of particular importance in the study 
was the attempt to provide information 
regarding the problem of the organization of 
psychomotor behavior. Under investigation 
was the specificity of tracking training in a 
transfer situation. The results of the study 
showed that subjects trained at a medium 
target speed made scores significantly superior 
to those made by groups trained at low and — 
at high target speeds, when later tested on the © 
low and medium target speeds. This same 
group, however, made inferior scores when 
tested on the high target speed. Because of 
this, it seems reasonable to conclude that 
there are specific factors involved in tracking 
training, i.e., that training at certain target 
speeds leads to superior performance when 
particular target speeds are introduced at a 
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later time. As no significant overall group 
differences appeared, however, this superiority 
cannot be said to hold for any one training 
group at all target speeds. It is likely that 


the general inconsistency of psychomotor 
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performance as well as the low degree to 
which such behavior can be predicted, may in 
part depend upon this specific effect of training 
in the psychomotor task. 


Received October 27, 1950. 





Dimensional Analysis of Motion: I. Effects of Laterality 
and Movement Direction * 


Roger Davis,t Robert Wehrkamp, and Karl U. Smith 


University of Wisconsin 


Although numerous experiments have been 
reported concerning the nature of manual 
response in persons of different sidedness, 
careful analysis of the reactive features of 
manual responses in such groups has not been 
made. In this study, special techniques for 
the measurement of separate components of a 
movement pattern have been developed and 
applied to analysis of manual motions in 
right- and left-handed individuals. The ex- 
perimental methods used have permitted 
investigation of handedness in relation to the 
component characteristics of movement and 
to fundamental space dimensions of the 
movement pattern. 
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of this analytic method to different task 
situations. 


Figure 1 illustrates diagrammatically the 
separate features of the universal motion ana- 
lyser. The timing circuit of the device consists 
of an electronic relay operated on a current 
level of subthreshold value for the human skin. 
Connections are made between this relay and 
the subject and between the relay and objects 
and controls to be manipulated. The elec- 
trical connection with the operator is made by 
means of an electrode attached to the ankle. 
The electronic relay is made to activate a 
double-pole, double-throw external relay, which 
is used to operate precision time clocks or re- 
cording devices. One direction of action of 
this relay operates a clock which registers 
manipulation time or contact with the object 
































CONTROL PANEL 
Fw. 1. 


Methods 


The technique of motion study used in this 
experiment is based on a device which has 
been named the “Universal Motion Analyser.” 
This technique provides the following advan- 
tages in investigation of movement patterns: 
(1) automatic registration of elapsed time in a 
set task, (2) separate automatic registration of 
elapsed time in the two basic components of 
’ manipulation and travel involved in a move- 
ment pattern, and (3) flexibility of application 


* This research was oupeee by the Research Com- 


mittee, The University of Wisconsin Graduate School. 
+ Now at the University of South Dakota. 
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or control involved in the task. The other 
direction of action of the relay activates a 
second clock which records the travel time 
between two successive contacts with the ob- 
jects or controls used in the task. A third 
clock, registering total time, is used as a check 
indicator for the clocks measuring the different 
components of the motion. 

In order to investigate various dimensions of 
human motion such as those of significance to 
this study, the timing circuit just described is 
connected to a general control panel, as dia- 
grammed in Figure 1. This panel, five feet 
long and three feet high, is constructed with 
rotary switch controls arranged at regular inter- 
vals on the panel. Electrical connections are 
made between each of the switches and a ter- 
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minal board. Appropriate controls for selected 
combinations of switches to be operated are 
made through this terminal board. 

Arrangement of the apparatus for the present 
problems involved, as a first step, the specifi- 
cation of different directions of movement. 
Four different directional patterns were used. 
In terms of the numbers for the switches given 
on the diagram, these four different patterns 
involved turning the switches in the following 
sequences: 


Direction I: Left—Right: 1, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12, 13, 14, 15, 16. 
Direction II: Right—Left: 4, 3, 2, 1, 8, 7, 6, 
5, 12, 11, 10, 9, 16, 15, 14, 13. 
Direction III: Up—Down: 1, 5, 9, 13, 2, 6, 
10, 14, 3, 7, 11, 15, 4, 8, 12, 16. 
Direction IV: Down—Up: 13, 9, 5, 1, 14, 10, 
6, 2, 15, 11, 7, 3, 16, 12, 8, 4. 


The total distance and number of switches 
turned in each pattern are identical. 

The variable, the handedness of the subject, 
was controlled in this study by selection of 
individuals in terms of a measurement battery 
of handedness administered to the subjects. 
Thirty-two right-handed and thirty-two left- 
handed male subjects from elementary psy- 
chology classes were selected on this basis. All 
sixty-four subjects, after a practice trial on 
each directional pattern, performed the four 
directional patterns of movement with the pre- 
ferred hand and the same four patterns with 
the non-preferred hand. Subjects were in- 
structed to perform as fast as possible in 
turning the switches in each pattern. 

Parallel analyses of variance were computed 
for the manipulation and for the travel times. 
All second- and third-order interactions were 
summed as the error term since they were all 
of small magnitude. 
to in this paper are significant at the .1 per cent 
confidence level or beyond, and do not interact 
significantly with individual differences.! 


Results 


Some of the main results of the study may be 
described in terms of the following points: 

As shown in Figure 2A, right-handed sub- 
jects are superior to left-handed subjects in 
regard to both components of travel time and 
manipulation time in the type of movements 
studied. Relative differences between per- 
formances with the preferred hand and 

1 The detailed statistical analysis has been filed with 
the American Documentation Institute. Order Docu- 
ment No. 3065 from American ae rae gig het 
tute, 1719 N Street, N.W., Washington 6, D. C., 
remitting $1.00 for microfilm (images 1 inch’ high on 


standard 35 mm. motion picture film) or $1.00 for 
photocopies (6 X 8 inches) readable without optical aid. 


All of the effects referred © 
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non-preferred hand are shown in Figure 2B. 
For both the preferred and non-preferred 
hand, right-handed subjects are superior to 
left-handed subjects in both manipulative and 
travel components of the movement pattern. 
Right-handed subjects show no difference 
between the preferred and non-preferred hand 
in the travel components of the movement 
pattern, but a significant difference between 
the two hands is found for manipulative 
component of the movement pattern. In the 
left-handed group, the reverse is true. Travel 
time for the preferred and non-preferred hand 
in left-handed subjects differs significantly, 
but there is no difference between the two 
hands in these subjects in regard to manipula- 
tion. Figure 2C gives a more complete 
picture of the handedness pattern in terms of 
direction of movement. The letters, R-L and 
L-R indicate right-left movements and left- 
right movements, respectively, in the horizon- 
tal pattern of motion. The letters U-D and 
D-U represent, respectively, the up-down and 
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Fic, 2C. Performance and handedness and patterns of 
movement of the two hands. 
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the down-up directions of action in the vertical 
pattern of motion. The more rapid speeds of 
motion for the right-handed subjects as 
compared to left-handed subjects may be 
noted again for both the travel and manipule- 
tive components of the movements. This 
difference was found for both horizontal and 
vertical motion patterns and for both directions 
of motion in the ryovement pattern. In other 
words, the interaction of handedness, pattern, 
and direction of motion is not of great signif- 
icance. Differences in handedness in this 
repetitive clockwise switching motion are 
very systematic and appear consistently for 
all components of the movement pattern 
for different space dimensions of the activity. 

Performance varies in relation to the pat- 
terns of motion which have been studied. 
There are no differences between horizontal 
and vertical movement patterns in terms of 
manipulation time, but vertical patterns of 
movement show a faster travel component 
than the horizontal movement patterns (Figure 
3A). When pattern of movement is considered 
in relation to hand preference as shown in 
Figure 3B, the following facts are observed. 
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Travel-time is of shorter duration for both 
preferred and non-preferred hands in vertical 
patterns than it is in horizontal patterns. 
Manipulation time does not differ between the 
two patterns in the case of either the preferred 
or non-preferred hand. 

In Figure 3C, the duration of movements 
is shown as a function of direction of motion 
within horizontal and vertical patterns. In 
the horizontal pattern, left-to-right movement 
does not differ from right-to-left motion for 
either manipulation or travel time. In the 
vertical patterns, downward directions of 
motion are faster than upward motion in terms 
of the travel component, but slower with 
respect to time of manipulation. The differ- 
ence in manipulation time is not statistically 
significant. 

This research has dealt with another basic 
aspect of psychomotor activity, already 
touched on in part in the sections just given. 
The property of sensorimotor activity referred 
to is the relation between the reactive com- 
ponents of travel and manipulation in a 
movement pattern. Correlation figures de- 
rived from the separate measures of travel and 
manipulation show that these two reactive 
components are independent of one another. 
Overall, the correlation between the two in 
the experiment is +-0.037. 


Summary and Discussion 


A general and applied method of analysis of 
the manipulative and travel components of. 
psychomotor patterns has been developed. 
The technique, as designed, has universal 
application to motion study of task situations, 
in which it is desired to determine accurately 
the quantitative characteristics of reactive 
components of the behavior involved in 
different control and assembly operations. 

Experimental study, utilizing the special 
techniques developed, has been conducted 
with regard to differences in manual motions 
of right- and left-handed subjects. Some of 
the differences recorded would appear to have 
important application to problems of personnel 
selection and design of equipment in relation to 
handedness of workers and operators. One 
main observation is that left-handed subjects 
are significantly inferior to right-handed in- 
dividuals in both manipulation and travel 
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components of a motion of operating control 
switches in a clockwise direction. 

In either right- or left-handed individuals 
handedness does not manifest itself in the 
task situation as a simple matter of general 
superiority of the preferred hand over the 
other. In right-handed subjects, there is no 
significant difference between the two hands 
in the travel component of motion, but the 
right hand is markedly superior to the left in 
the manipulative component of the motion 
pattern. The applied problems of design of 
controls for the two different hands center 
particularly around the questions of manipula- 
tion rather than distance linkages between 
separate controls. 

Data were also obtained concerning differ- 
ences in the manipulative and travel com- 
ponents of motion in different patterns of 
movement. Manipulation time does not 


change as a function of direction of movement. 
Travel time, however, is shorter for vertical 
patterns of motion than for horizontal patterns. 
In applying these data to the design of control 
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panels, it would appear that problems of 
horizontal and vertical linkages of controls are 
serious problems only in respect to differences 
in the very rapid travel components of move- 
ment patterns. 

A fundamental finding which has come out 
of the present study has implication for all 
fields of applied and theoretical analysis of 
motion. It has been observed that there 
is no systematic relation or correlation between 
the manipulative and travel components of 
motion patterns ina set task. This fact points 
up a need for marked revision of measurement 
technique and applied study of psychomotor 
tasks in relation to both individual appraisal 
of psychomotor performance and _ general 
analyses of psychomotor skill situations. It is 
quite evident that the failure to find significant 
relations between different psychomotor tasks 
and to predict performance in these tasks is 
accounted for in part by unknown variations 
in the basic unrelated components of manipula- 
tion and travel in the task situations. 


Received October 31, 1950. 
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Fryer, Douglas H., and Henry, Edwin R. (Ed.). 
Handbook of applied psychology. New York: 
Rinehart and Co., Inc., 1950. Volumes I 
and II, pp. xix, 842. $12.50. 


1950—Publication of Handbook of Applied 
Psychology! 

The above item might be added to “Land- 
marks of Professional Psychology,” an intro- 
ductory section. to the Handbook, which 
lists important events in the history of applied 
psychology. These begin with Fechner’s 
introduction of psychophysical methods in 
1860 to the publication of the Dictionary of 
Occupational Titles in 1939. For the Hand- 
book of Applied Psychology was a monumental 
undertaking in both meanings of the term; 
it is “massive and imposing” and it “serves as 
a monument”’ to those psychologists who have 
contributed to the development of psychology 
as a profession as well as a science.! 

The size of the undertaking can readily be 
indicated by a few statistics concerning the 
Handbook. It is in two volumes, totalling 
842 double-column pages. There are 115 
separate articles written by 116 contributors; 
several articles have multiple authors and 
several contributors have written more than 
one article. From references in the articles, 
a selected and classified bibliography of nearly 
2,400 items has been compiled and is presented 
in Volume Two. Volume One has a 7-page 
subject index of its own and Volume Two a 
12-page index covering both volumes. 

The 115 contributions (called Sections) 
are classified into 18 ‘chapters’ entitled 
Group Living, Individual Efficiency, The 
Individual Adjusting to His Environment, 
Techniques of Personnel Psychology, Fields of 
Personnel Psychology, Industrial Psychology, 
Transportation, Business Psychology, Educa- 
tional Psychology, Fields of Educational 
Psychology, Clinical Psychology, Fields of 
Clinical Psychology, Penology,: -Consulting 
Psychology, Contributions of Applied Psy- 
chology to the Professions, Contributions of 
Applied Psychology to the Arts, Administra- 
tion of Psychological Services, The Profes- 
sional Psychologist. Each chapter consists of 

1 The reviewer prefers to consider another dictionary 


meaning of monumental, viz., “of.a size larger than that 
of life,” as not being applicable to the situation. 


a number of sections dealing with various 
aspects of the chapter title. Business Psy- 
chology, for example, includes a section on 
Product Testing by Corby, on Copy Testing by 
Freiberg, on Advertising Media Research by 
Lucas, on Customer Research by Jenkins, 
and on Radio and Television Audience 
Research by Dichter. 

It is obviously impossible for this review to 
present a critical evaluation of each of the 
115 contributions. The Handbook must be 
judged, rather, in terms of its stated purposes: 
“(1) to provide a comprehensive survey of 
applied psychology that differs in point of 
view from the treatment usually found in 
textbooks; (2) to provide a detailed account 
of specialization in professional work that will 
show how psychologists apply the principles, 
techniques, and experimental findings of 
scientific psychology to the solution of every- 
day practical problems of society; and (3) to 
provide a description of the organization and 
ddministration of professional psychology, its 
standards and requirements, its relationships 
with other professions, and its contributions 
to their practice.” 

In line with these three purposes, the sections 
are of three general types: (1) those like 
Section 10—Illumination (Farnsworth) which 
provides a concise, fact-packed summary oi 
what is known about the characteristics of 
illumination; (2) those like Section 28—Item 
Analysis (Wherry), which describes current 
methods for evaluating test items; and (3) 


‘those like Section 106—Administration of 


Psychological Services in Public Schools 
(Luckey), which describes the place, functions, 
working conditions, and interprofessional re- 
lations of psychologists at work in the public 
schools. 

Reading through the two volumes serialim, 
as the reviewer did, is really quite an exper- 
ience. Ina way, it is like looking down on an 
ant-hill, watching the ants going busily about 
their varied tasks, some going one way, some 
seeming to accomplish a lot, some seemingly 
running around in circles. Likewise, we 
find psychologists scurrying about at a 
multitude of tasks—from psychological warfare 
(Section 5) to designing dial faces (Section 49 
—Bio-Mechanics), to evaluating class size 
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(Section 67—Instruction in College and 
University), to detecting criminals (Section 83 
—Examination of Offenders), to graphing the 
pitch curves of actor’s voices during emotional 
expression (Section 103—The Theatre), to 
staffing a psychological center (Section 110— 
Administration of the General Psychologic 
Center). Besides the traditional applications 
of psychology to education, industry, and 
clinical practice, psychologists are found 
working or consulting in library science 
(Section 95), typography (Section 9), highway 
design (Section 50), engineering (Section 93), 
and even eugenics (Section 98). 

In fact, the Handbook itself is an epitome of 
the general state of applied psychology in 
1951. The various sections in the Handbook 
exhibit the same characteristics as various 
fields of application or various psychologists 
in the field. Some of the sections are rather 
superficial, others are thorough; some are 
largely impressionistic descriptions of standard 
procedures and common beliefs, others present 
objective findings based on sound research; 
some are well-organized, others are rather 
casual observations; some articles have a 
well-defined topic, others seem to be vague 
as to objectives and to overlap with other 
sections; some are more concerned with facts, 
others with what to do; some are more con- 
cerned with enlarging the field of knowledge, 
others with taking stock and evaluating the 
present knowledge. The Handhook even 
illustrates the difficulties psychology iaces in 
gettirg its research findings put to practical 
application; its own topography makes liberal 
use of all-caps although Section 9 on Typog- 
raphy presents evidence that “all-capitals 
text retarded speed of reading by about 12 
per cent” and states that “few typographical 
factors will retard reading to this extent.” 

Unfortunately, although the Handbook as 
a whole epitomizes applied psychology as a 
whole, it would not be safe to conclude that 
each section accurately represents the stage of 
development of the particular area of applied 
psychology being discussed. It is on this 
point that the Handbook suffers a major 
limitation, in the opinion of this reviewer. 
If the mythical Man from Mars were to try to 
gain a quick picture of the profession of 
psychology just from reading the Handbook, 
he would get a distorted picture. For example, 
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one would think that psychologists in speech 
work were concerned more with assigning 
Greek terms to speech difficulties than with 
speech correction, that psychologists paid more 
attention to ventilation (two excellent.sum- 
maries in Sections 11 and 45) than to job 
classification and job evaluation (a three-page 
section), that American psychologists were 
more concerned with the behavior of criminals 
(a whole chapter on Penology covering 32 
pages) than with .their own _ professional 
problems (Chapter XVIII—The Professional 
Psychologist, covering 25 pages). Such a 
limitation is inevitable in a handbook with 
multiple authors, unless the editors would 
plan the sections in detail and rigidly prescribe 
how they were to be written. 

On the other hand, one does find reflected 
in the articles many of the basic problems 
affecting psychology as a profession. There 
is disagreement, or at least confusion, as to 
the function of the psychologist as a therapist. 
The increasing need for specialization as 
knowledge expands is counterbalanced by 
attempts to maintain a broad, common 
foundation for all psychologists (see Hunt’s 
discussion of the training of clinical psychol- 
ogists, Section 112) or to insure that the 
client gets the benefit of the pooled knowledge 
of associated specialists (as described in 
Section 110 in Administration of the General 
Psychologic Center). That psychologists are 
being recognized as a professional group is 
evidenced by the number of contributors 
whose job title is “psychologist.”” The number 
of articles on administrative problems suggests 
the rapid growth in psychological programs 
and the increasing autonomy of psychologists. 
Finally, as Coxe states in Section 115, Profes- 
sional Attitudes, ‘‘we are not only a group 
of inquiring psychologists; we are a professional 
group living in a complex society.... If 
psychology is becoming a profession as we 
think it is and if individuals are to seek 
professional employment as psychologists, the 
psychologists must understand as far as 
possible the world in which they are to 
function.” 

This growing need for psychologists to be 
able to adapt their specialized knowledge to 
related areas of application is well stated in 
Wulfeck’s comments on the Consulting Psy- 
chologist in Business & Industry (Section 87) 
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and in Cowan’s description of the psychological 
consultant in a social agency (Section 96). 
The correlative need for psychology to be 
concerned with the societal as well as scientific 
outcomes of their work is not only recognized 
by many of the writers but is evidenced in 
the Handbook itself. The Chapter on Group 
Living, through no particular fault of the 
contributors involved, is a relatively weak 
portion and reflects the present lack of a 
concerted attack on psychological aspects of 
social problems. 

The most difficult decision for this reviewer 
concerned the evaluation of the Handbook as a 
reference work. It would seem that the 
Handbook will be used primarily in that 
fashion, i.e., as a source to dig into when attack- 
ing some phase or surveying some area in 
applied psychology. Unfortunately, the 
Handbook suffers from some basic limitations 
in this respect. 

First, the articles are not consistently 
up-to-date, even as of the time of publication. 
The Handbook was long in planning and 
execution. 


inally written in the early 40’s but World 
War II necessitated a postponement of 


publication. Although an attempt was made 
to revise the articles after the war, the results 
are not consistent and some of the presenta- 
tions are essentially pre-war in flavor and 
coverage. This fact results in certain anach- 
ronisms due to the long time interval between 
date of writing and date of publication and 
the reader sometimes wonders to what date 
terms like “current” or “prevailing” apply. 
The bibliographic references are more up-to- 
date. Of the nearly 2400 listed in the appen- 
dix, approximately one-fourth are dated 1945 
or later. The system of references, however, 
creates some difficulty in using the Handbook 
as a source-work. In the text, references 
are indicated by name of author and date of 
publication. At the end of each section, a 
selected list of general references on the topic 
is listed. At the end of Volume Two, the 
many references from the Handbook as: a 
whole are collated under six major headings: 
Individual & Group Living, Personnel Psy- 
chology, Industrial & Business Psychology, 
Educational Psychology, Clinical Psychology, 
and Professional Relations. Although these 
areas seem to follow the order of the chapters, 


Many of the articles were orig- 
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yet one frequently finds difficulty in tracking 
down a given reference from the text, especially 
since the bibliography at the end is “selected” 
rather than complete. Some of the references 
are therefore indeterminate. In addition, 
there are a number of instances of discrepancies 
in the dates as given in the text and the refer- 
ences as listed in the bibliographies. 

The Handbook is also limited in its use as 
a reference source because of the infrequent 
instances of cross-references, despite the over- 
lap in topics. It would have been a tremen- 
dous editing problem to provide a complete 
set of cross-references (and the editors should 
not be criticized for failing to undertake such 
a task) but some of the more obvious cross- 
references might have been made. For ex- 
ample, although there is a general treatment of 
The Psychological Examination in Section 71, 
there are also specific discussions of psychologi- 
cal examining in working with mental deficiency 
(Section 77), with mentally ill (Section 78), 
with physical disease (Section 79), and with 
children (Section 80). The Subject Index 
compensates somewhat for the lack of cross- 
references, but even this leaves something 
to be desired as, for example, the index 
listing for Morale refers to Section 3—Morale 
and Section 47—Military Morale Procedures 
but not to Section 46—Industrial Morale. 

The limitations described above, however, 
derive primarily from the practical difficulties 
involved in the execution of a plan of wide 
scope and complexity rather than from the 
plan of the publication itself. 

One may legitimately ask, “Why should I 
invest $12.50 of my hard-earned salary for this 
handbook?” The answer is fairly obvious. 
Nowhere else can one find under one cover 
equally concise and systematic surveys ‘of the 
accumulated facts in the field of applied 
psychology, descriptions of the areas in which 
psychologists work, and observations on the 
professional problems faced by psychologists. 
It is like having 116 experts lined up outside 
one’s office and being able to call them in when 
needed to obtain the benefit of their reading, 
research, and professional experiences. If 
$12.50 seems like a lot, compute the consulting 
fees represented by these 116 experts at your 
beck and call. 

This Handbook should be in the library of 
every institution and of every psychologist— 
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both pure and applied. For the pure psy- 
‘chologist, it will serve as an eye-opener to 
what his fellow psychologists are doing, and 
as a source of suggestions for areas of needed 
basic research. For the applied psychologist, 


it will be a source of information as to related 
fields and to general professional problems. 
For the student, it will serve both as a concise 
summary of findings in certain areas of 
psychology and as a means of orienting himself 
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to professional outlets for his psychological 
interests. Trained social scientists in general 
should find it a useful means of finding out 
what psychologists are able to contribute to 
related disciplines. 

This reviewer hopes that this Handbook, 
like the Census, will have a decennial edition. 


Albert S. Thompson 


Teachers College, Columbia University 
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Steiner, M. E. The psychologist in industry. 
Springfield: Charles C Thomas, 1949. Pp. 
vii + 107. $2.00. 

“This lecture in print has been prepared 
especially for employers, supervisors, and 
counsellors in industry who wish to become 
better acquainted with the work of the 
industrial psychologist.” The first part dis- 
cusses selection and placement techniques; 14 
pages being devoted to the application blank, 
interview, job evaluation and time and motion 
study; and an additional 57 pages cover 
testing, with 20 of these pages given to 
projective techniques. Part two consists of 
12 pages on “observing the worker on the 
job” (environment, accidents, fatigue, absen- 
teeism, turnover, ratings, and morale). Part 
three (15 pages) deals with employee counsel- 
ing. 

Although parts of the book are excellent, 
the total effect is disappointing. It is doubtful 
that a brief book for employers should include 
definition and description of average deviation, 
standard deviation, standard error of the 
mean, probable error of the mean and chi- 
square; and, if they should be included, 
certainly they should require more than the 
single page allotted. Also questionable, in a 
book of this type, is the listing of names and 
addresses of test publishers and listing of 
tests (including twenty “better known” per- 
sonality questionnaires) with no information 
other than what they purport to measure and 
the publisher. Description of how to compute 
IQs seems out of place in a book for employers. 
Emphasis on projective techniques, especially 
the Rorschach, does not describe the work of 
typical industrial psychologists. 

In general, other books will give employers 
a better description and understanding of the 
psychologist in industry. 

C. E. Jurgensen 

Minneapolis Gas Company 


Baumgarten, F., Ed. Progress of psychotech- 
nics I: 1939-1945. Bern: A. Francke, 1949. 


Pp. 315. S. Fr. 35. 

In many countries the past 10 to 15 years 
witnessed momentous changes in the fate and 
activities of psychologists concerned with the 
manifold problems of human work. At the 


same time it has been particularly difficult for 
any one individual to keep abreast of the 
developments taking place abroad. Any at- 
tempt, based on the literature available up 
to now, to arrive at a comprehensive picture 
of the recent past and the current international 
status of this sector of applied psychology 
would have necessarily resulted in failure. 

World War II interrupted the free flow of 
publications. In many places the direction of 
research was altered drastically. The files of 
“classified”? material swelled while some of the 
leading European journals were forced to 
cease publication (e.g., the French Le Travail 
Humain). In some countries in which psy- 
chology showed rapid advances in the thirties, 
e.g., in Poland, most of the media of publica- 
tion, institutions of psychological service and 
research, and personnel were annihilated. 

However, profound upheavals in the personal 
lives and professional activities of applied 
psychologists took place long before 1939. 
In Germany, dislocation of personnel began in 
1933 when Hitler came to power. The Nazi 
pressure resulted in the forced emigration of 
such eminent workers as William Stern and 
Walter Blumenfeld, Pau! Lazarsfeld and 
Charlotte Biihler, We have little detailed 
and verified information about the bloody 
purge of industrial psychologists in the middle 
thirties in the Soviet Union when psychological 
tests and “bourgeois” ideas about individual 
differences were banned ; psychotechnology and 
industrial physiology, showing up to that time 
much vitality, were “‘liquidated.”” The Spanish 
Civil War sent a few people, including Emilio 
Mira, into exile. The Italian fascists, some- 
what belatedly, began to imitate their Axis 
partners by expelling such individuals as the 
psychologist Enzo Bonaventura (Florence) 
and one of the best students of human work 
physiology, Amadeo Herlitzka (Turin). In 
Austria, Professor Karl Hackl, director of a 
large psychotechnical institute in Vienna, 
lost his job in May 1938, shortly after the 
Anschluss of Austria to Germany, because he 
was “politically unbearable” (politisch un- 
tragbar) for the new regime; the institute was 
dissolved. 

Persisting and newly created difficulties— 
monetary, ideological, international (visas)— 
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caused a delay of the first post-war congress 
of the International Association of Psychotech- 
nics which met at Bern in 1949. In order to 
orient the congress participants as to what 
transpired in the field of psychotechnics 
during the war years, the energetic secretary 
general of the association arranged a written 
symposium on this topic. The volume will 
be welcomed by all who want to fill the gap 
in their information about the recent develop- 
‘ments in psychology applied to industry, 
armed forces, and—to a lesser extent—educa- 
tion. Only rarely did the contributors make 
an excursion into the area of applied social 
psychology (e.g., L. Moss, of London, describ- 
ing social surveys during the war). 

This is “‘psychotechnics” in a considerably 
narrower sense than was given it by Hugo 
Miinsterberg in his monumental “Grundziige 
der Psychotechnik” (1914, 767 pp.). In the 
English-speaking countries the term has never 
found a real agceptance. Psychotechnology 
fared somewhat better. The Belgians prefer 
the term ergologie, science of work. The 
continuing dissatisfaction with the term psy- 
chotechnics is expressed by the fact that 
recently (1948) the Norwegians replaced 
officially the term psykoteknikk with arbeids- 
psykologi. 

Twenty-five countries were represented in 
the symposium, mostly by one delegate. The 
individual ¢ontributions differed considerably 
in length (from 1 to 37 pages), in the amount of 
detail and in the thoroughness of coverage of 
their respective topics. The U.S.A. were 
represented by W. V. Bingham who had 
served as Chief Psychologist in the Adjutant 
General’s Office of the War Department and 
who presented the work done by psychologists 
in the armed forces, and M. S. Viteles who 
described recent trends in the use of psycholog- 
ical methods in the selection and classification 
of employees, drawing in part on the expe- 
riences made in selecting aircraft personnel. 
The work done in France, with its limited 
facilities and concentration of research in 
Paris, was portrayed by R. Bonnardel and S. 
Pacaud. There were 4 contributors from 
England, 6 from Switzerland, and 7 from Italy. 

A short note on the situation in Germany 
was presented by E. Pechtold, an engineer who 
had long served as psychotechnologist in 
large steei mills at Moravské Ostrava in 
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Czechoslovakia. A Czech author promised 
in 1947 to take part in the symposium but in 
1949 was unable to keep his promise; as 
Hackl in 1938 in Vienna, he would likely have 
become “‘politically unbearable.” 

In central and Eastern Europe much of 
applied psychology is facing liquidation. One 
wonders whether progress in psychotechnics 
was a proper topic for the symposium. The 
neutral term recent history, appropriate for 
ups as well as downs, would have been more 
justified. The picture of psychotechnology’s 
recent past painted by the 50 odd papers—in 
English (19), French (24), and German (10)—is 
not, and hardly could be, complete. Never- 
theless, it is the best portrait available. As 
an historical document the volume has an 
enduring value. 

At the same time, a reader familiar with 
the past and the present of the American 
applied psychology will notice little that is 
new or unique. The story, as far as technical 
matters are concerned, is largely a repetition 
of developments which have taken place also 
in the United States. 


Josef BroZek 
University of Minnesota 


Hamilton, Kenneth W. Counseling the handi- 


capped in the rehabilitation process. New 

York: The Ronald Press Company, 1950. 

Pp. iv + 296. $3.50. 

The keynote of this book is contained in the 
following statement: “. . . rehabilitation of 
the handicapped is a facet of the democratic 
expression of an enlightened people’s concern 
for the dignity, worth, and participation of its 
own membership.” This, considered along 
with the country’s need for all contributing 
members, places rehabilitation in the van of 
critical problems. 

It is a thoughtful, stimulating, although 
non-technical book. It is divided into four 
sections: one deals with basic concepts, 
another with the rehabilitation process, a 
third with community resources, and a final 
one with evaluation. The 1943 amendments 
to the Barden-LaFollette Vocational Re- 
habilitation Act are included in the appendix. 

The first and third sections are particularly 
well done. Hamilton is outlining a coor- 
dinated, community-wide project aimed at 
assisting the client to attain the maximum 
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competence of which he, with his individual 
needs, liabilities, skills, and potentialities, is 
capable. His emphases are on understanding 
the individual and using community resources 
efficiently. 

Yet this is primarily a book on administra- 
tion, a fact which makes the title somewhat 
misleading. The discussion of counseling is 
quite elementary and will probably not do 
much for people not already aware of the 
problem. In many places, depth has been 
sacrificed for breadth. Moreover, there are 
indications of some lack of understanding of 
vocational counseling in such statements as, 
“.. . vocational guidance being of necessity 
more directive and informative than vocational 
counseling,” and, ‘‘Tests are primarily an aid 
to the counselor.” 

To implement his idea of a coordinated 
service, Hamilton has described a new kind of 
worker, the rehabilitation counselor. A 
specialized social caseworker, he is to be 
essentially a quarterback. However, his train- 
ing rests on a question of role and duties, which 
are not made entirely clear. If he is to be a 


quarterback, then social casework training, 
supplemented by work in vocational guidance, 


would be appropriate. If, on the other hand, 
he is to be a counselor in the usual sense of the 
word, then he should be trained as such with 
work in social casework as an adjunct. Hamil- 
ton has not .made the ro’e clear enough to 
prescribe training adequately. , 
In the discussion of evaluation, the author 
has largely ignored the lack of studies dealing 
with client adjustment and satisfaction. Turn- 
over, accident rates, absenteeism, etc., are 
the criteria used, but these, while important, 
are only indirect measures of client attitudes. 
Hamilton is not to be blamed for the lack of 
such studies, but he should have recognized 
it more fully. Also, he has restricted the 
discussion to the physically handicapped, 
ignoring the large numbers of people needing 
help to readjust from emotional breakdowns. 
This book should make its greatest contribu- 
tion in acquainting members of the various 
interested groups with each other’s roles and 
in emphasizing the need for cooperation. 
Vocational counselors will learn little about 
their own specialty, but they should work 
better by seeing it in a setting which in- 
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cludes social workers, physicians, teachers, and 
others. 


John W. Gustad 
Vanderbilt University 


Zahl, Paul A. (Editor). Blindness: Modern 
approaches to the unseen environment. 
Princeton (N. J.): Princeton University 
Press, 1950. Pp. 576. $7.50. 

The materials in this treatise, prepared at 
the request of the Committee on Sensory 
Devices of the National Research Council, 
were written by authorities in the various 
branches of work for the blind. Thirty-one 
authors contributed the 34 chapters. The 
aim was to obtain statements of history, 
methods, status and hopes of each author’s 
respective area of interest. 

The seven major sections are concerned with: 
(1) history and welfare, (2) education and 
psychology, (3) vocational considerations, 
(4) the military blind, (5) time-tested boons to 
the blind, (6) exploration of new technological 
horizons; and (7) ophthalmological factors. 
The newer materials in the field are discussed 
in Section VI and deal mainly with develop- 
ments under the Committee on Sensory 
Devices of the National Research Council 
during World War II. Although many new 
facts were uncovered, to a large degree the 
practical use of the discoveries by blind 
persons awaits further engirieering develop- 
ment. 

The newer trends in working with the blind 
foster the ambition of the blind person to be 
a human being whose “reserve faculties can 
be developed to compensate for the visual 
handicap.” This presents a real challenge to 
those working with the blind. The contents 
of this volume indicate that there has been 
considerable progress toward meeting this 
challenge. 

The materials in this volume manifest many 
of the shortcomings common to treatises com- 
posed of chapters written by separate authors. 
Many of these shortcomings are recognized by 
the editor. There is considerable lack of 
agreement between authors in expressed 
attitudes and opinions. In some instances, 
the same data are cited by different authors 
but are interpreted somewhat differently. 
Furthermore, organization of the discussions 
into a systematic whole was impossible. There 
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is no claim that all aspects of problems 
concerning the blind have been treated. 
For instance, no reference is made to the 
extensive contributions on prevention of 
blindness such as those sponsored by “The 
National Society for the Prevention of Blind- 
ness.” Y 

This book is important. For the first time, 
a major portion of the information on blindness 
is available in one place. Leaders in the 
field have brought together in summarized 
form their knowledge and experience concern- 
ing most aspects of blindness. No person 
seriously interested in problems of the blind 
can afford to neglect this book. 

Miles A. Tinker 


University of Minnesota 


Hollingshead, A. B. Elmiown’s youth. New 

York: Wiley, 1949, pp. 480. $5.00. 

This text reports the observations of an 
anthropologist on more than 700 boys and 
girls in a midwestern community under 
study by the Committee on Human Develop- 
ment at the University of Chicago. Although 


all the families in the community were stratified 
into the six-fold class system of Warner and 
Lunt, not all the children of these families 


were covered with equal intensity by this 
research. The text combines quantitative 
data, including test scores, and _ general 
anthropological data, based on observations 
of children in groups in particular situations, 
and on particular children interviewed at 
great length. 

The social psychologist, school counselor 
and personnel worker will find much that is 
challenging in the conclusions drawn by this 
method. Many psychologically trained people 
will feel that perhaps closer study would 
reveal considerable variation within the broad 
type settings described. However, a book 
based on descriptions of broad categories, 
including quotations from notes made on the 
spot, and reconstructed conversations held 
with actual people, carries an interest to the 
general reader and a sense of conviction which 
a straightforward, quantitative rehearsal would 
lack. 

A number of chapters are given to describing 
the social setting of the community, including 
the cultural characteristics of the community 
and the school system. The in-school and 
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out-of-school life of youth is then described 
at length. The high school as a social setting, 
dating behavior, the formation of cliques, the 
significance of religion to the youth, recreation 
in relation to adult taboos, and the youth’s 
concept of his vocational future are considered 
at length. Emphasis is placed upon the work 
experience of the adolescents in the commercial 
life of this community. The author observes 
that adolescent youth compete keenly for 
paid job experience. The industry of the 
youth who works is generally approved by 
other youth and by adults at all levels of this 
society. However, the status significance 
which work has for adults is reflected in the 
attitudes of youth. Young people engaged 
in certain kinds of jobs lose caste; others, 
more fortunate, locate part-time jobs by which 
they can enhance their status. The jobs 
which serve to enhance or depress status 
reflect adult, middle- and upper-class standards. 
The author finds that the traditional American 
attitude toward opportunity is firmly rooted 
in these youth, even though the possibility of 
realizing such opportunities is far from cor- 
related with youth’s expectation. 

_Through the entire book the pervasive 
influence of the class system in channelizing 
the experience of youth is reiterated. The 
effect of class structure undoubtedly is import- 
ant in our society, though many social psy- 
chologists will question some of the generaliza- 
tions. It is perhaps a matter of the particular 
evidence to which one attends. One may 
look to the central tendency, as anthropolo- 
gists have usually done, and be impressed 
with the types he finds, or he may concern 
himself with variability, in which case he will 
remark on overlapping, and blurring out of 
“types.” The chief contribution of this book 
may well be that it presents facts already 
familiar to the well-read counselor and person- 
nel worker in a broader, perhaps more con- 
vincing setting than one usually finds them— 
that of a picture of a real community. Cer- 
tainly some of the social meanings of such 
facts are more readily apparent in this setting 
than when merely presented in statistical 
array. Whether this method of elucidating 
social meaning is the only one possible is 
another matter. Dale B. Harris 

Institute of Child Welfare, 

University of Minnesota 
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